Evolutionary Generalisation and Genetic Programming

colossalbangAI and Robotics

Nov 7, 2013 (4 years and 2 days ago)

252 views

Evolutionary Generalisation and Genetic Programming
Ibrahim KUSCU
Submitted for the degree of DPhil
University of Sussex
September 1998
Declaration
I hereby declare that this thesis has not been submitted,either in the same or different form,to this
or any other university for a degree.
Signature:
Acknowledgements iv
This is a list of my publications which are used as supporting material
in various parts of this thesis.

Ibrahim Kuscu,Promoting generalization of learned behaviours in genetic programming.
In A.E.Eiben,editor,Fifth International Conference on Parallel Problem Solving from
Nature,Amsterdam,27-30 September 1998.

Ibrahim Kuscu,Evolving a generalised behavior:ArtiÞcial ant problem revisited.In V.
William Porto,editor,Seventh Annual Conference on Evolutionary Programming,Mission
Valley Marriott,San Diego,California,USA,25-27 March 1998.Springer-Verlag.

Ibrahim Kuscu,A method of promoting generalisation in genetic programming.In John
R.Koza,Wolfgang Banzhaf,Kumar Chellapilla,Kalyanmoy Deb,Marco Dorigo,David B.
Fogel,Max H.Garzon,David E.Goldberg,Hitoshi Iba,and Rick Riolo,editors,Genetic
Programming 1998:Proceedings of the Third Annual Conference,page 192,University of
Wisconsin,Madison,Wisconsin,USA,22-25 July 1998.Morgan Kaufmann.

Ibrahim Kuscu,Evolution of Learning Rules for Hard Learning Problems,in Proceedings
of Fifth Annual Conference on Evolutionary Programming,J.McDonnell,B.Reynolds and
D.Fogel (Eds.),1996,MIT Press.

IbrahimKuscu,Evolutionary and Incremental Methods to Solve Hard Learning Problems in
Proceedings of First International Genetic Programming Conference,Stanford University,
Koza et al.(Eds.),1996,MIT Press,USA.

Ibrahim Kuscu,Genetic Programming and Incremental Approaches to Solve Supervised
Learning Problems in Proceedings of ICML96,Evolutionary computing and Machine Learn-
ing Workshop,T.Fogarty and G.Venturini (Eds.),1996,Bari,Italy.

Hilan Bensusan and Ibrahim Kuscu,Constructive Induction using Genetic Programming
in proceedings of ICML96,Evolutionary computing and Machine Learning Workshop,T.
Fogarty and G.Venturini (Eds.),1996,Bari,Italy.

Ibrahim Kuscu,Simple Genetic Programming for Supervised Learning Problems in Pro-
ceedings of Fifth Turkish Symposium on Articial Intelligence and Neural Networks,E.
Alpaydin and U.Cilingiroglu and F.Gurgen and C.Guzelis (Eds.),1996,Bogazici Uni.
Press,Istanbul.

Ibrahim Kuscu and Chris Thornton,Design of artiÞcial neural networks using genetic al-
gorithms:review and prospect in Proceedings of Third Turkish Symposium on Articial
Intelligence and Neural Networks,Bozsahin,C.et al.Eds),1994.This is also printed as
COGS Research Paper 319
Acknowledgements v
Research Papers

IbrahimKuscu,Evolution of Learning Rules for Supervised Tasks I:Simple Learning Prob-
lemsÓ University of Sussex,COGS,CSRP-394,1995.

Ibrahim Kuscu,Evolution of Learning Rules for Supervised Tasks II:Hard Learning Prob-
lemsÓ University of Sussex,COGS,CSRP-395,1995.

Ibrahim Kuscu,Incrementally Learning the Rules for Supervised Tasks:The MonkÕs Prob-
lemsÓ University of Sussex,COGS,CSRP-396,1995.
Evolutionary Generalisation and Genetic Programming
Ibrahim KUSCU
Abstract
A closer examination of the learning research in AritiÞcial Intelligence reveals that the views of
learning as perceived by researchers using genetic based methods seem to be different than the
views employed by traditional Machine Learning (ML) research.Genetic based learning tends to
favour a continuous learning with self organising learning systems,which is mainly performance
or problemsolving oriented.ML research,on the other hand,tend to develop systems with human
crafted internal structures,which are strong in reaching the goals set by the theoretical work in the
area.One of such goals,establishing necessary theoretical grounds,aims at developing learning
systems which generalise what they learn to new but relevant tasks.
The thesis aims at bridging the gap between traditional and genetic based learning research by
promoting interaction between the two with respect to performance expectations from a learner
in the form of generalisation.In searching for this beneÞcial interaction,the thesis explores the
tendency in genetic based methods towards a memorisation (i.e.,simple look-up table) and com-
pression (i.e.,a compact re-representation) oriented learning and emphasises the necessity and the
requirements for generalisation (i.e.,predictive accuracy in responding to unseen cases) oriented
learning.
A particular emphasis is given to a sub-area of genetic based learning research called genetic
programming (GP).After identifying the lack of proper consideration of generalisation in GP,
several experiments involving both supervised learning problems and simulations of learning be-
haviours are developed in order to explore the ways in which the generalisation performance of
the solutions produced by GP can be improved.The Þndings of these GP experiments reßect that
borrowing some of the principles from traditional learning research provides signiÞcant ways of
improvement in the approaches to learning in the form of evolutionary generalisation.One of the
experiments suggests that generalisation of learned behaviours are possible by using a training
regime based on environment sampling.Another set of experiments suggest that generalisation in
GP can be improved by selection of a set of non-problem-speciÞc functions.Finally,other than
improving on the standard applications of GP,a set of experiments presents how GP can be used
in improving performance of other learners such as back-propagation.
Out of many possible ways of a beneÞcial interaction with the traditional learning methods,
only a few could be presented in this study.There is,however,an inevitable necessity and rich
potential for future improvements in the area,which are also presented in this thesis.
Submitted for the degree of DPhil
University of Sussex
September 1998
Contents
1 Introduction 1
1.1 Evolutionary Approaches to Learning and Generalisation.............3
1.2 Learning Problems and Experiments........................5
1.3 Scope and Limitations...............................7
1.4 Summary of the Chapters..............................8
2 Learning and Generalisation 10
2.1 Introduction.....................................10
2.2 What is Learning?..................................11
2.3 Learning Research in AI..............................11
2.4 Learning Paradigms.................................12
2.5 Computational Learning..............................13
2.5.1 PAC Learning...............................14
2.5.2 Vapnik-Chernovenkis (VC) Dimension..................15
2.5.3 Cross Validation..............................16
2.6 A Framework for Learning.............................17
2.7 The Issue of Generalisation.............................20
2.8 Evolutionary Generalisation............................23
2.8.1 Why is Evolutionary Generalisation Important?..............25
2.9 Conclusions.....................................26
3 Relational Learning Problems 27
3.1 Introduction.....................................27
3.2 Relational Learning Problems...........................28
3.3 Are Relational Problems Generalisation Problems?................29
3.4 Three Monks Problems...............................30
3.4.1 Training and Testing Sets..........................32
3.5 Parity Problems...................................33
3.6 Review of Methods and Systems Exploiting Relational Effects..........33
3.6.1 Back-propagation on Relational Problems.................33
3.6.2 Other Methods for Solving Relational Problems..............34
3.7 Conclusion.....................................36
4 Genetic Algorithms,ClassiÞer Systems and Generalisation 37
4.1 Introduction.....................................37
4.2 Genetic Algorithms:an Overview.........................37
4.3 ClassiÞer Systems.................................40
Contents viii
4.4 Generalisation and ClassiÞer Systems.......................41
4.4.1 Predictive Accuracy vs Generalisation in CS...............42
4.5 Conclusions.....................................45
5 Genetic Programming and Generalisation 46
5.1 Introduction.....................................46
5.2 Genetic Programming................................46
5.2.1 Recent Advances in GP..........................48
5.3 Generalisation in Genetic Programming......................48
5.3.1 Generalisation of Solutions to Problems with Input-Output Mappings..49
5.3.2 Generalisation of Solutions to Problems with Simulated or Real Robot
Behaviours.................................51
5.3.3 Summary on the Generalisation and GP..................53
5.4 Learning Bias,GP and Generalisation.......................53
5.5 Proposed Methods to Promote Generalisation in GP................54
5.5.1 Promoting Generalisation in Supervised Learning.............55
5.5.2 Promoting Generalisation in GP for Simulated Behaviours:the ArtiÞcial
Ant Problem................................57
5.6 Conclusion.....................................58
6 Evolutionary Generalisation for Learned Behaviours 59
6.1 Introduction.....................................59
6.2 ArtiÞcial Ant Problem...............................59
6.3 Previous Experiments and Generalisation.....................60
6.4 Understanding the Environment..........................62
6.5 Experimental Setup.................................63
6.6 Generalising to SantaFe-Like Trails........................64
6.7 Looking for Predictive Generalisation.......................66
6.8 Training Using Multiple Environments......................68
6.9 A Method for Obtaining Generalisations......................70
6.9.1 How Large Should the Training Set Be for a Generalisation Performance?71
6.10 Conclusion.....................................73
7 Evolutionary Generalisation for Supervised Learning Problems 75
7.1 Introduction.....................................75
7.2 A Simple Genetic Programming Model......................76
7.2.1 The Encoding Scheme...........................76
7.2.2 Genetic Aspects of the Simple GP Model.................77
7.3 Evolution of Learned Rules for Some Simple Problems..............80
7.3.1 OR,AND and not-XOR problems.....................81
7.3.2 Eight Linearly Separable Tasks......................82
7.4 Evolving for Generalisation:The Simple GP Model on the Monks Problems...85
7.5 Discussion on the Simple GP Model........................90
Contents ix
7.6 Conclusions.....................................91
8 Evolutionary Generalisation for Relational Learning Problems 92
8.1 Introduction.....................................92
8.2 Monk-2 Problem..................................92
8.2.1 Results on Monk-2 Problem........................93
8.3 Parity problems...................................96
8.3.1 Experimental Setup.............................96
8.3.2 Results on Parity problems.........................96
8.3.3 Computational Cost............................99
8.4 Conclusions.....................................100
9 IncrementalModel 101
9.1 Introduction.....................................101
9.2 Random Search...................................101
9.3 Incremental Hill Climbing.............................102
9.3.1 The Encoding Scheme...........................103
9.3.2 Forming Incremental Representations...................103
9.4 Results of Incremental Model...........................104
9.4.1 Cost of Evaluation.............................105
9.4.2 Discussion on the Incremental Model...................106
9.5 Conclusions.....................................108
10 A Genetic Constructive Induction Model 109
10.1 Introduction.....................................109
10.2 Finding New Attributes...............................110
10.3 Experimental Setup.................................111
10.3.1 The Genetic Search Model.........................112
10.3.2 Selected Learner..............................113
10.3.3 Performance of Selected Learner......................113
10.4 An Initial Experiment................................116
10.5 Higher Bit Parity problems.............................118
10.6 Cost of GCI.....................................120
10.7 Discussion on GCI.................................121
10.8 Conclusions.....................................122
11 Conclusions 124
11.1 Suggested Improvements on the Approaches to Evolutionary Generalisation in GP 125
11.2 A Hybrid Method is Useful for Evolutionary Generalisation............128
11.3 A Critical Look at the Thesis and the Further Research..............128
11.4 Summary of Principal Contributions........................130
Bibliography 132
Contents x
A Extra Figures of Chapter 6 137
B Extra Figures of Chapter 8 145
C Extra Figures of Chapter 9 151
D Extra Figures of Chapter 10 156
List of Figures
4.1 A classiÞer system..................................41
6.1 The SantaFe Trail...................................60
6.2 Characteristics of the SantaFe Trail.........................62
6.3 Problem summary for the ArtiÞcial Ant problem..................64
6.4 Generalisation performance of best,average,and worst individuals on all the
SantaFe family trails................................65
6.5 Basic primitives of unseen trails..........................66
6.6 Generalisation performance of the best(a),average(b) and worst(c) individuals on
the trails with unseen characteristics........................67
6.7 A set of training trails................................68
6.8 Generalisation performance of individual trained on 5 trails............69
6.9 Generalisation performance of 15 individuals over random trails.........71
6.10 Average generalisation performances of individuals trained on varying number of
randomly generated trailsÑthe score of 1.0 indicates perfect performance....73
7.1 Tree representation of an expression.........................77
7.2 Rank based selection.................................79
7.3 Crossover and mutation algorithm..........................80
7.4 Problem summary for OR,AND,not-XOR problems...............81
7.5 Fitness cases for OR,AND,not-XOR problems..................81
7.6 Four supervised tasks of ChalmerÕs experiments..................83
7.7 Problem summary for Monks problems......................86
8.1 Lil-gp parameters used for the Monk-2 problem...................93
8.2 Problem summary for Monk-2 problem......................94
8.3 Training and testing performances on Monk-2 problem...............95
8.4 Problem summary for Parity problems.......................97
8.5 Training and testing performances on Parity-5 problem using problem speciÞc
(boolean) functions..................................98
8.6 Training and testing performances on Parity-5 problemusing non-problem-speciÞc
functions functions..................................98
8.7 Training and testing performances on Parity-6 problem using problem speciÞc
(boolean) functions..................................99
8.8 Training and testing performances on Parity-6 problemusing non-problem-speciÞc
functions.......................................99
9.1 Training and testing performances on the Monk-2 problem using random search.102
List of Figures xii
9.2 Training and testing performances on the Monk-2 problemusing incremental search.106
9.3 Individuals processed for the Monk-2 problem using incremental search......106
10.1 Overview of basic steps in GCI model........................112
10.2 Lil-gp parameters used for GCI model........................112
10.3 Quick-prop parameters used in GCI model.....................114
10.4 Performances of quick-prop on parity problems without additional attributes...115
10.5 RMSE of quick prop on parity-5 with original features...............115
10.6 RMSE of quick-prop on parity-6 with original features...............116
10.7 Percent correct of quick prop on parity-5 with original features...........116
10.8 Percent correct of quick prop on parity-6 with original features...........117
10.9 Results summary for parity-4 problem.......................118
10.10Problem summary for Parity problems.......................118
10.11Performances of quick-prop on higher-bit parity problems with additional attributes.119
10.12RMSE of quick prop on parity-5 with additional features..............119
10.13RMSE of quick prop on parity-6 with additional features..............120
10.14Percent correct of quick prop on parity-5 with additional features.........120
10.15Percent correct of quick prop on parity-6 with additional features.........121
A.1 Parameters used for the ArtiÞcial Ant Problem(see Koza 1992,Chapter.7)....138
A.2 SantaFe family trails.................................139
A.3 Generalisation performance of all individuals on all the SantaFe family trails...140
A.4 Trails with unseen characteristics..........................141
A.5 Generalisation performance of all individuals on trails with unseen characteristics.142
A.6 Rotated SantaFe Trail................................143
A.7 Generalisation performance of all individuals on rotated SantaFe trails.......143
A.8 The Pets trail exhibiting increasingly difÞcult primitives of trails..........144
A.9 Generalisation performance of all individuals on the Pets trail...........144
B.1 Generational view of training and testing performances for 24 runs of Monk-2
problem........................................146
B.2 Generational view of training and testing performances for 30 runs of parity-5
problem with boolean functions...........................147
B.3 Generational view of training and testing performances for 30 runs of parity-5
problem using non-problem peciÞc functions....................148
B.4 Generational view of training and testing performances for 30 runs of parity-6
problem with boolean functions...........................149
B.5 Generational view of training and testing performances for 24 runs of Parity 6
problem using non-problem speciÞc functions....................150
D.1 A typical run showing the performance of quick-prop on parity-4 without addi-
tional attributes....................................157
D.2 One of the best performing runs showing the performance of quick-prop on parity-
5 without additional attributes............................158
List of Figures xiii
D.3 A typical run showing the performance of quick-prop on parity-6 without addi-
tional attributes....................................159
D.4 A couple of examples of additional attributes for solving pariuty-4 (UXOR stands
for XOR function deÞning two-bit parity)......................160
D.5 A typical run showing the performance of quick-prop on parity-5 without addi-
tional attributes....................................161
D.6 Best of run solutions for additional attributes for parity-5.............162
D.7 A typical run showing the performance of quick-prop on parity-6 with additional
attributes.......................................163
D.8 Best of run solutions for additional attributes for parity-6..............164
List of Tables
3.1 Selected Results (in percentages) of Comparison Experiment (Thrun et al Õ91)..32
4.1 Crossover operator (The Ô

Õ shows the crossover point,chosen randomly).....39
4.2 Examples of classiÞers.A colon separates conditions fromactions.........40
6.1 Scores (in percentage) with varying training sizes.................72
7.1 Best performances in percentages.........................86
8.1 Training and Testing performances on Monk-2 problem..............94
9.1 Best performances in percentages.........................104
9.2 Comparison of Incremental Model with Results of Best Performing Learning Meth-
ods (Thrun et al Õ91)................................105
C.1 Best 30 performances and number of individuals evaluated using incremental model
on Monk-2 problem.................................155
Chapter 1
Introduction
Machine Learning (ML) research has always been one of the more active areas in ArtiÞcial Intel-
ligence (AI).Its aimis creating computer programs that automatically improve their performance
through experience.It includes an immense diversity of approaches (e.g.,inductive,deductive,
analogical,etc.) and paradigms (e.g.,connectionist,symbolic and genetic based).Good introduc-
tions to the area can be found in Michalski,Carbonell,and Mitchell (1983),Michalski,Carbonell,
and Mitchell (1986) and Carbonell (1990).Shavlik and Dietterich (1992) provide a collection of
core reading papers and a good overview of various methods.A common agreement among ML
researchers is that learning is a prerequisite for intelligence and therefore it must be investigated
in depth.A theoretical foundation for the Þeld has been established by research in Computational
Learning Theory (Kearns,1990) (see also chapter 2,section 2.4).
Investigation of learning in AI has developed in parallel with the major paradigms of AI such
as symbolic,connectionist and evolutionary methods (description of these paradigms are deferred
until chapter 2).Acareful examination of the research in learning would clearly reveal a signiÞcant
gap between the conventional learning methods (i.e.connectionist or symbolic) and evolutionary
methods.This may be due to the fact that evolutionary methods have a shorter history than the
conventional methods.The course of research related to the conventional methods reßect com-
paratively well established theoretical and practical approaches to many aspects of developing
artiÞcial learning systems.
One of the important aspects of learning systems is concerned with the performance assess-
ment of a learner.Among few alternative performance assessment methods measuring the gen-
eralisation performance of the learner is an important and the most widely used one.(Although
an in depth account of generalisation is given in chapter 2,a brief description is given here for
clarity).Most of the common practices of conventional methods use a learning process which
is carried out in two stages:training and testing.In the training stage the learner is expected to
learn a task based on some examples or the instances relevant to the task.In the testing stage the
learnerÕs performance is assessed by trying it on some new instances relevant to the same task.
The important point,in this case,is that the learner is expected not to simply memorize instances
of the task presented to the learner during training but rather to learn themin such a way that what
is learned can also be applied to the the new and relevant instances.The success of the learner in
Chapter 1.Introduction 2
Þnding correct outputs for these newinstances is,sometimes,described as its predictive accuracy.
In this way,the learner is involved in an inductive process and such learning is most useful in many
cases where the complete instances deÞning a task may not be available.This is is especially the
case for real world learning problems.Thus,as will be presented in detail in the next chapter,
assessment of a learner based on the generalisation performance is a very signiÞcant aspect of
developing artiÞcial learning systems.
Conventional learning methods have already established some effective principles and meth-
ods in designing learning systems and measuring the generalisation performance.Evolutionary
methods,however,are relatively younger.Besides,due to the different nature of the paradigm(see
next section),common practices and attitudes regarding learning may have resulted in different ap-
proaches and methods in developing artiÞcial learning systems.It may be due to these different
approaches and methods that a gap between the research in evolutionary methods and conventional
methods with regard to developing effective learning systems seems to have occurred.One of the
major contribution to the investigation of the learning might come from bridging the gap between
conventional learning and evolutionary methods and promoting a beneÞcial interaction between
them.
This thesis contains details of such investigation in one of the sub-areas of evolutionary learn-
ing research known as genetic programming (GP) (Koza,1992).The thesis focusses on probably
the most important aspect of learning;generalisation which is an important measure of a learnerÕs
performance assessment method used by the conventional learning methods.The choice of exam-
ple learning problems are from simulated learning behaviours and a class of supervised learning
problems called relational problems (see chapter 3) which have been shown to be very difÞcult
even for conventional learning methods to solve them with a good generalisation performance
(Thrun,Bala,Bloendorn,Bratko,&et al.,1991).
Over the course of the thesis several major points in improving generalisation performance of
the solutions produced by GP will be developed.These points include the followings:
1.providing an account of theoretical and experimental approaches of conventional learning
methods with respect to generalisation and how these approaches relate to evolutionary
methods,especially to GP.
2.in relation with the Þrst point,presenting the weak points and lack of proper consideration
of generalisation in evolutionary methods (speciÞcally,in GP) in terms of both paradigmatic
and practical viewpoints.
3.suggesting several methods and approaches in which a beneÞcial interaction between con-
ventional learning methods and evolutionary learning methods can be achieved and showing
that such interaction can help to develop better and formal methods of building genetic based
learning systems with improved generalisation performance.
4.in order to support the methods and approaches proposed,developing several experiments
and providing evidences in favour of such an interaction that can help to develop GP based
learning systems with signiÞcantly improved generalisation performance.These experi-
ments will involve several easy and difÞcult supervised learning problems as well as simu-
lation of learned behaviors.
This chapter provides a light introduction to the subject and the argument of the thesis.Most
technical issues and deÞnitions are deferred until the next chapter.The next section contains a brief
Chapter 1.Introduction 3
glance at the evolutionary methods and their approaches to learning.It also provides a general
guidance with regard to the motivations and the goals of the thesis.Section 3 introduces the
experiments and the problems to be used in these experiments.It brießy introduces the particular
area of research and the problems to be dealt with in this thesis.In section 4,a few paradigmatic
issues will be clariÞed in order to determine the boundaries of the research and the styles of
approaching to the problems.Finally,in the last section,summaries of the forthcoming chapters
will be presented.
1.1 Evolutionary Approaches to Learning and Generalisation
In the evolutionary context,it is helpful to identify two general levels of adaptation.
1.Learning:this is an adaptation process at the level of individual;i.e.an individual entity
which adapts to its environment over time.
2.Evolution:this is adaptation at the population level;i.e.through an evolutionary history
members of a population acquires an improved Þtness in a particular environment.
Following the work of Holland (1975) on adaptive systems,evolutionary approaches in com-
putational models of learning have gained acceptance.He invented a method based on natural
selection and natural genetics.This method,later called genetic algorithms (GAs) provides a
simple model for evolution.
The use of GAs in learning can broadly be classiÞed into two classes:genetic based ma-
chine learning (DeJong,1988b) and evolutionary and adaptive learning systems (Meyer &Guillot,
1991).In the former,GAs are used to solve those problems which would normally be tackled by
symbolic and connectionist methods.In the latter,GAs are used to develop autonomous/adaptive
systems (e.g.,animats/robots) which learn through a continuous interaction with the environment
(e.g.,such as in ArtiÞcial Life research).
Some research on evolving computer programs concentrates on solving a variety of ML prob-
lems.Much of the current work in the area follows the work of Koza (1992) under the name of
genetic programming (GP).By using a variable length LISP-like representation,Koza extended
the work of Cramer (1985) which demonstrated that parse trees could provide a natural represen-
tation for evolving programs.Koza applied this technique to a broad range of problems from ML
(see chapter 5 for a more detailed review of the method).
GP is also tested on a set of example problems from recent research on learning focussing on
developing adaptive systems in the area of artiÞcial life and behaviour based robotics (Meyer &
Guillot,1991).According to this research,the traditional knowledge based approaches to learning
show fundamental deÞciencies for the purposes of robotics.Traditional AI techniques focus on
simulating the cognitive abilities purely in terms of symbol manipulation.They require that the
design of appropriate robot behavior and adaptation takes place in the mind of the designers.
However,evolutionary and adaptive systems defend the idea that a learning systemshould possess
the adaptive power in order to respond effectively to changes in the environment.For example,a
simulated animatÕs adaptive behavior enables it to continue to perform the functions for which it
was built in a particular environment.
Chapter 1.Introduction 4
In general,application of GAs in learning emphasises an adaptive systems perspective where
the focus is on developing systems that are capable of making changes to themselves over time
with the goal of improving their performance on the tasks confronting themin a particular environ-
ment (DeJong,1988a).The models developed are performance (or task accomplishment) oriented
and focus on learning as a change in performance over time rather than on changes to internal
structures.Learning is viewed as a continuous process and the environment in which a system
performs is often dynamic.It is preferable for a system to respond to such changes by continuous
learning in the environment rather than requiring manual intervention in the design process so that
system can anticipate possible changes.The internal state of an adaptive system is considered as
a black-box,and as such is not directly accessible to an outside teacher.ModiÞcation through
interpreting and integrating advice or any form of feed back is carried out by the adaptive system
itself.
The positive aspects of adaptive systems,such as self-modiÞcation through continuous inter-
action with the environment,have yielded new understanding and research directions in the Þeld
of AI (e.g.,artiÞcial life research).Since learning is viewed as the change in performance over
time on the task that an adaptive systemis evolved for,the measure of performance evaluation (i.e..
how well the learner has learned) is often based on whether the system is achieving a predeÞned
task or not.A GAbased systemwould typically start froma random initial population of possible
solutions,which in most cases would be far fromcontaining the desired solution,and by means of
genetic search this initial state of the system will move towards a state which contains the desired
solution over generations.The performance of such a system is considered as progressing from
the random state to the state containing the solution over time.Such a performance measure does
not explicitly account for predictive accuracy and ability to generalise to the novel circumstances
as required in the learning framework adapted by the conventional learning methods.In fact,in
most cases adaptive systemsÕ success comes from the fact they have exhibited a goal behavior or
found a solution for the environment that they are evolved for.Unfortunately,this approach does
not compare to the idea of training and testing which forms an essential basis as a performance
measure in ML research.In ML terminology,adaptive learning systems can be said to have passed
the training process but their performances for similar tasks or environments remains to be tested.
As it seems to be a common practice in GA based experiments,a solution found after an evolu-
tionary process is often hoped to be general (see chapter 4 regarding the generalisation issue in a
genetic based learning system called classiÞer system,and chapters 5 and 6 for genetic program-
ming examples).This thesis provides experimental evidences in support for the argument that
such a practice may frequently result in non-general and brittle solutions (please refer to chapters
4 and 6).
In genetic programming (GP) (Koza,1992) assessment of learner in terms of generalisation
performance seems to be neglected over the past years.For example,in (Koza,1992) almost all
of the problems which use data sets are solved by making the complete instance space available
as data to the learner (except time series prediction problem stated on pp.245 Ñ 255).All of
the other problems involving simulations of animat or robot behaviours rely on a very optimistic
hope that evolving solutions based on a static environment may result in general solutions.In fact
in some of GP experiments (see the review in chapter 5) it seems that a simple problem solving
Chapter 1.Introduction 5
approach is taken when a consideration of generalisation would be more appropriate (i.e.getting
an animat to performa target behaviour for a particular set-up) (Koza,1992).Anumber of similar
approaches in GP as well as in other genetic based methods is reviewed later in the various sections
of the chapters 4,5 and 6.One of the aims of the thesis is to explicitly state the importance of
the issue of generalisation for artiÞcial learning systems and,yet,show that it does not receive the
attention it deserves in genetic based systems and,in particular,in GP.Then,several experimental
approaches will be suggested to show that borrowing some methods from conventional learning
methods can be useful in improving generalisation of solutions produced by GP.
In summary,the paradigmatic nature of the GAsystems reßected by the characteristics of self
organisation and continuous interaction with the environment is considered as a fundamental dif-
ference between evolutionary learning methods and conventional learning methods.Acceptance of
such a difference as a favourable aspect of GAbased systems by some practitioners of GAresearch
seems to have inßuenced the ways in which GA based learning systems are built.For example,
some GAbased approaches,such as GP,have not effectively shared the developments in the other
learning paradigms of AI.Moreover,the approaches to learning seems to have been considered
in parallel with the strong and distinctive characteristics of the paradigm.Thus,the perception
of the learning in some GA based systems may be limited by the distinctive characteristics of the
paradigm such as self organisation and continuity.This,at least in the experimental approaches of
GP,seems to have resulted in a consideration of learning where the focus is just to solve a prob-
lem.The performance assessment used in this consideration,as this thesis argues,falls short of
those performance methods (i.e.such as generalisation) established by the conventional learning
methods.However,such a distinctive paradigmatic views should not prevent a beneÞcial interac-
tion with other paradigms of AI.Ideally,for example,a learner would be one which is capable
of self organising with a continuous interaction with the environment and which can succesfully
generalise its experience to the novel (but similar) situations.In this thesis,several approaches
will be developed to bridge the gap between the understanding of learning and generalisation in
the conventional methods and in the evolutionary methods.These approaches will be using a set
of problems where the importance of interaction is the highest.One set of problems involve sim-
ulation of learned behaviours and the other problems include relational learning problems.The
next section introduces these problems and thesis proposals to solve these problems.
1.2 Learning Problems and Experiments
The issue of generalisation in GP can be studied in terms of two broad class of problems:prob-
lems involving simulations of behaviour learning and supervised learning problems.These classes
reßect two different problematic approaches to learning and assessment of learnerÕs performances.
For those problems involving simulations,often,a single static environment is used and learnerÕs
performance is measured in terms of how well the learner performs a goal behaviour in this envi-
ronment (see chapter 5 and 6).Such an approach can easily result in learners which are capable
of learning particular characteristics of the environment.When tested in a similar environment
the learner exhibits great degree of brittleness and failure to generalise to these novel (but similar)
environments.The thesis presents a set of fundamental deÞciencies in the current practices of GP
Chapter 1.Introduction 6
in Þnding general solutions for problems involving simulated learning behaviours and taking an
example problem,suggests a method promoting generalisation based on environment sampling.
The method is constructed by borrowing some principals from conventional ML research.
Learning problems using data sets (i.e.supervised problems) are considered in terms of the
degree of difÞculty of Þnding general solutions for them.Some problems are easier to solve with
successful generalisation than others for both conventional learning and evolutionary methods.
The solution to these problems may often be determined by a correlation between values of certain
input(s) and output(s) directly and as such they may be ideal for many learning methods available
in producing general solutions for them.The thesis presents another class of supervised learning
problems (see chapter 3) called relational learning problems.These problems are characterised
by the fact that their solutions may be determined by a relationship among the value input(s),
which in turn may determine the value of the output(s).The thesis does not attempt to present
a clear distinction between relational problems and other easy to solve problems.Given that the
universe of learning problems may be placed in a range fromeasy to difÞcult according to typical
performances of well-known learning methods,relational learning problems may be considered
as being closer to the ÔdifÞcultÕ edge of the range.In fact,the relational learning problems pose
severe challenges to conventional learning mechanisms (Thrun et al.,1991).As is known and will
be shown in this thesis connectionist and symbolic learning mechanisms are often unable to Þnd
generalising solutions to these problems.These problems will be the focus of our attention for two
reasons:
1.since any advancement in learning research has an important inßuence on other areas of AI
research,it is important to investigate the nature of relational learning problems and their
possible solutions.The contribution of the thesis will be directed to the investigation of GP
based solutions to relational learning problems and compare these with the achievements of
the conventional methods.
2.since the solution to these problems requires a discovery of possible re-coding that may be
hidden in the input/output mapping and such a discovery is hard,they constitute a proper
benchmark for the investigation of generalisation in terms of predictive accuracy.Using
evolutionary methods these problems are often solved by presenting the complete set of
instances deÞning a problem to the learner (i.e.a separate training and testing cases are
not used;see for example (Koza,1992;Goldberg,1989) presenting solutions to parity and
multiplexer problems which are typical examples of relational learning problems).In this
way what is achieved may not be the discovery of the relationship among the inputs but
solving the problemby Þnding a compact representation of the instances (i.e.a compression
based solution).Thus,attempting to solve the relational learning problems where some of
the instances are randomly chosen as training and the rest as testing cases can help us to
make a clear distinction between compression based learning and the learning based on
predictive accuracy (see chapter 2 for the deÞnition of the terms).
The thesis proposes several newapproaches in Þnding general solutions for relational learning
problems using GP.These methods include use of more general,not-so-problem speciÞc (Ônon-
problem-speciÞcÕ fromhere onwards) functions (as opposed to a common belief in GP that use of
problem speciÞc functions should be favoured),use of an incremental hill-climbing method and
use of a GP/back-propagation hybrid.The result of the experiments using these approaches are
Chapter 1.Introduction 7
compared with the results of conventional learning methods in terms of both success of generali-
sation performance and computational cost.
Relational learning problems which are chosen for experimentation include the parity prob-
lems and the three Monks problems (Thrun et al.,1991).In the case of even parity,the rule for
true cases require that an even number of 1s exist among binary input values.Thus the value of
the output depends on the value of a computation involving values of inputs.Three monks prob-
lems are used to compare performance of well-known learning algorithms (Thrun et al.,1991).
Among them,monk-2 is an example of relational learning problems.The rule for the true cases
for this problem require that out of Þve input values (ranging from 1 to 5),exactly two of them
must have a value of 1.It is similar to parity problems.It has been shown in (Thrun et al.,1991)
that only back-propagation and AQ17 can solve it with 100 percent success on testing.However,
back-propagation requires a binary conversion of the data set and AQ17 has a mechanism which
tests the number of attributes having a particular value.In this respect,back-prop did not actually
solve them using the original problem deÞnition and AQ17 could only solve them by the help
of the bias introduced.Though bias is sometimes useful,in this thesis,I will argue that solving
relational learning problems by the help of such human intervention does not provide a substantial
contribution to research on learning.Instead,the problems should be solved in their originally de-
Þned formand the mechanism such as the one used by AQ17 should be discovered.Using several
genetic programming implementations presented in chapters 7 through 10,it will experimentally
be shown that such a discovery is possible.
1.3 Scope and Limitations
As mentioned in section 2 and the previous section,learning in this thesis involves such improve-
ments of an artiÞcial system that,once it has learned a behavior or a task,it can use it to properly
respond to unseen instances of the behavior or task.This is different to learning based on mem-
orisation (or compression) of the complete instances of a particular task.In the latter,what is
presented to the system is succesfully learned.However,in the former,the learning involves a
successful prediction of novel instances based on a previous training.This is the ability to gener-
alise.In some cases,where the problemis not complex,compression based learning might possess
a successful ability to generalise.In this thesis such cases will be shown based on experiments
involving simple supervised learning problems.However,the primary goal of the thesis is to solve
relational learning problems.It will be shown that their solution requires a learner with an ability
to generalise but not merely with an ability to compress.
As previously noted,conventional methods of solving supervised learning problems show
poor performance on relational learning problems.The selection of genetic based methods to
solve supervised problems may create an incompatibility between the paradigmand the problems.
Genetic algorithms use reinforcement which encourages positive moves towards the target solution
without explicitly referring to a scalar measure of performance in relation to target performance.
GAs generally work by punishing the performance of unÞt individuals in relation to the goal.In
this respect the feedback provided does not contain explicit information about the target solution.
In supervised learning,on the other hand,the learner is provided with a set of examples of correct
information in the form of input/output mapping during the training period.In this respect,the
Chapter 1.Introduction 8
teacher provides some explicit information about the target solution and the overall behavior of the
learner is measured based on a scalar measure of error from the target solution.Thus,in a sense
using GAs for supervised learning problems means giving away some of the already provided
information on the way towards the solution.It can be inefÞcient to have to train all the individuals
of a population every generation with a set of example instances.On simple problems the issue can
be ignored (DeJong,1988a).However,the experiments in this thesis showed that such inefÞciency
can be an important issue in solving relational learning problems.
1.4 Summary of the Chapters
Introduction.Chapter 1 provided the basic introduction to the problem of learning and general-
isation in evolutionary approaches (especially in genetic programming),the thesis arguments and
approaches,scope and limitations and the chapter summaries.
Learning and Generalisation.Chapter 2 presents detailed descriptions and deÞnitions of
learning,generalisation and evolutionary generalisation.It provides a review of essential aspects
of relevant research and forms a basis for the arguments and experiments of further chapters.
Relational Learning Problems.Chapter 3 includes a description and examples of relational
learning problems.These examples constitute the problems which are used in the experiments of
the thesis.It also includes reviews of some symbolic methods and connectionist methods (such
as BACON,GLAUBER and back-propagation) which capture relational regularities in a limited
manner.
Genetic Algorithms,ClassiÞer Systems and Generalisation.Chapter 4 presents brief re-
views of GAs and classiÞer systems and discussion on the generalisation versus compression in
classiÞer systems.It provides further examples and explanations of the terms compression and
generalisation within the evolutionary learning context.
Genetic Programming and Generalisation.Chapter 5 contains reviews of GP,the issue of
generalisation in GP and introduces the new approaches of the thesis to improve on the generali-
sation problem of GP.It distinguishes two different learning problems;supervised (i.e.problems
using input/output data mappings) and simulation based problems,and proposes thesis approaches
to improve generalisation in solving these problems.These proposals include (1) using random
sampling from the environment for simulation based problems,(2) using non-problem speciÞc
functions and (3) a GP back-prop hybrid to improve generalisation for supervised learning prob-
lems.The rest of the thesis presents experiments using these methods.
Evolutionary Generalisation for Learned Behaviours.Chapter 6 presents one of the pro-
posed methods for improving generalisation of evolved behaviours in simulated environments.
The experiment uses a well known artiÞcial ant problem.Its results indicate that contrary to the
optimistic claims in GP literature,solutions found for this problem using a single,static environ-
ment may be non-general and brittle.Further experiments in this chapter propose a generalisation
method involving environmental sampling which signiÞcantly increases the generalisation perfor-
mance of solutions found using this method.
Evolutionary Generalisation for Supervised Learning Problems.Chapter 7 introduces a
simple GP model implemented in LISP for evolving speciÞc learning rules for simple and rela-
tional learning problems.This model uses functions which are not so problem speciÞc.It states
Chapter 1.Introduction 9
clearly that the generalisation problem is of crucial importance in the case of relational learning
problems (i.e.the Monk-2 problem).It also states that solving supervised learning problems using
evolutionary methods (evaluating every individual for every training instances) may be computa-
tionally costly.
Evolutionary Generalisation for Relational Learning Problems.Chapter 8 presents a lil-
gp Ñ a GP development package Ñ implementation of experiments for relational learning prob-
lems such as the Monk-2 and parity problems.Experiments in this chapter present the use of
non-problem speciÞc functions and their impact on improving generalisation performances when
solving relational learning problems.Also,parity problems will be attempted to be solved using
incomplete mapping (i.e.splitting the data into training and testing sets).The results suggest that
use of non-problem speciÞc functions may improve generalisation performance in solving rela-
tional problems.Probably for the Þrst time in the literature,a good generalisation performance
over parity problems will be reported in this chapter.
An Incremental Model.Chapter 9 describes an incremental model for Þnding learning rules.
This model compensates for the problems encountered in the previous model in that it increases
the likelihood of Þnding a solution which can learn training cases in every run.It is less costly,but
it still suffers from not being able to produce successful generalisation performance.The chapter
also includes a comparison of the method with random search.
A Genetic Constructive Induction Method.Chapter 10 introduces a genetic programming
and back-propagation hybrid:a constructive induction method to solve relational learning prob-
lems.In this chapter,it is Þrst shown that relational learning problems are hard to solve using a
conventional learning method;back-propagation.GP is used to code for a composition of original
attributes.Based on this encoding a useful set of additional features are expected to evolve making
a general solution to relational learning problems possible using back-propagation.
Conclusion.The last chapter presents concluding remarks,a discussion of the ideas and
contributions developed in the thesis.It also includes suggestions for further research.
Chapter 2
Learning and Generalisation
2.1 Introduction
In the previous chapter the broad framework and the arguments of the research developed in this
thesis was presented.It was stated that there seems to exist a signiÞcant gap between the ap-
proaches of conventional methods and the evolutionary methods to learning in terms of paradig-
matic and practical issues.The gap reveals itself more explicitly in one of the important aspects
of building artiÞcial learning systems:performance assessment of the learner.It was pointed out
that while conventional methods have established some Þrm grounds in developing principles for
measuring learnerÕs performance,evolutionary methods,being a younger discipline and having a
different paradigmatic nature,were suggested to share these developments and improve on their
positive attitudes toward learning.Within this broad view,a contribution to investigation of learn-
ing in AI is suggested through promoting a beneÞcial interaction between conventional and evo-
lutionary methods.The issue of generalisation as an important and widely accepted performance
measure in conventional learning methods is identiÞed as a possible subject of such interaction.
Although,how evolutionary methods may beneÞt from this interaction were brießy introduced in
the previous chapter,much of the relevant issues were left untouched.
In this chapter,the broad view in the previous chapter will be detailed.The important aspects
of learning and generalisation as developed by the research in the conventional methods will be
clariÞed and related to the evolutionary methods with hope of identifying the ways in which a
beneÞcial interaction can be promoted.The chapter starts with presenting a working deÞnition
of learning and brief overview of learning paradigms in AI.It also provides a formal learning
paradigm called probably approximately correct learning (PAC) which is a generally accepted
framework in the conventional learning research.This serves to form a basis for evolutionary
generalisation to be developed later in the chapter.Next,a learning framework is developed in
order to relate some concepts developed by the conventional learning research to the evolutionary
learning.Finally,a detailed account of thesis understanding of generalisation and evolutionary
generalisation is presented.
Chapter 2.Learning and Generalisation 11
2.2 What is Learning?
For many years,research in ArtiÞcial Intelligence (AI) demonstrated that learning is a vital part
of intelligence.Learning exists in almost any sub-domain of AI (i.e in planning,natural language
processing,control,etc.).A common characteristics of these domains is that they all require a
form of learning component which aids in Þnding ways of acquiring and representing knowledge
about a domain,organising and storing it and using the knowledge in order to performa particular
task.
In parallel to the close link between learning and AI,providing a deÞnition for learning has
been as difÞcult as proposing a deÞnition for intelligence.One of the good deÞnitions can be
found in (Langley,1996) and is as follows:
Learning is improvement of performance in some environment through acquisi-
tion of knowledge resulting from experience in that environment.
In this deÞnition ÔperformanceÕ refers to a measure of behaviour on a task such as accuracy
or efÞciency.ÔEnvironmentÕ refers to an external setting to be learned.It is important that the
environment contains some kind of regularity for learning to occur.ÔKnowledgeÕ is an internally
stored data structure.ÔExperienceÕ refers to a process of converting received inputs into outputs.
Finally,ÔimprovementÕ refers to desirable changes in the performance.
Whatever the deÞnition of learning is,it must take into account some way of evaluating the
performance of the learner.It is only in this way possible that existence of learning can be jus-
tiÞed.Evaluation of performance requires that the state of a learner (in relation to its expected
performance) before and after learning can be measured and compared in an objective way.Such
a measurement process enables us to observe that the learner has either achieved a predeÞned goal
(or at least improved towards achieving it) or failed in accomplishing it.For example,generalisa-
tion is an essential performance measure for many tasks that learners are expected to achieve.In
fact,the generalisation expectation froma learner makes what it counts as a valuable learning.
2.3 Learning Research in AI
Learning research in AI mainly deals with creating real world learning in artiÞcial environments
such as computer simulations or real robots.It also helps us to understand the learning processes
existing in the real world.An empirical branch of artiÞcial learning aims at understanding charac-
teristics of learning algorithms and domains and howthey relate to learning by experimenting with
different learning algorithms and problems.Some other researchers deal with mathematical and
theoretical aspects of artiÞcial learning in order to formulate and prove theorems about the nature
of learning problems and algorithms and explore the difÞculty of the tasks and methods to solve
several class of learning problems (Kearns & Vazirani,1994).Some of the Þndings of artiÞcial
learning research have been successfully applied to solve some real world problems.
In the heart of the artiÞcial learning research are several learning algorithms which charac-
terise basically three different paradigms which developed together with several paradigms of AI:
Symbolic,connectionist (or artiÞcial neural networks) and genetic based research.
Chapter 2.Learning and Generalisation 12
2.4 Learning Paradigms
Research in machine learning is diverse.It involves several lines of research which are similar
in designation of learning systemsÕ goals and evaluation of performances but dissimilar in some
other respects such as the assumptions about the representation,performance methods and learn-
ing algorithms employed.A general approach to classify these paradigms is in parallel with the
paradigms of AI:the symbolic machine learning technique,the neural networks or connectionist
approach and evolution-based genetic algorithms.In the following paragraphs I provide a brief
overview of paradigms of learning not only from the point of view of basic paradigms of AI but
from a slightly more detailed view of artiÞcial learning itself (an in depth analysis of learning re-
search paradigms can be found in (Michalski et al.,1986) and a different viewin terms of different
methods of learning is presented in (Langley,1996).

Symbolic learning:symbolic machine learning techniques are those which have received
great attention from learning researchers.These can be classiÞed based on such underlying
learning strategies as rote learning,learning by being told,learning by analogy,learning
from examples,and learning fromdiscovery (Shavlik &Dietterich,1992).
Among these techniques,learning from examples appears to be the most promising sym-
bolic machine learning technique.It aims to induce a general concept description that best
describes the positive and negative examples.A similar framework called case-based or
instance based learning stores cases or speciÞc experiences as knowledge and tries to apply
them to new situations by relying on ßexible matching methods to retrieve them.Another
approach,called rule induction,employs condition-action rules,decision trees or similar
logical structures and attempts to create summaries of different classes in the input data.
Similar to this,some analytical learning methods also represents the knowledge in terms
of rules (or theorems) and try to Þnd solutions by searching for proofs of the theorems in
hand.The major goal of the analytical learners is improving the efÞciency in searching for
solutions (Michalski et al.,1986).
One of the well known examples of symbolic learners is QuinlanÕs ID3 decision-tree build-
ing algorithmand its descendants (Quinlan,1986,1993).ID3 takes objects of a known class
which is described in terms of instances with a Þxed collection of properties or attributes
and it produces a decision tree to correctly classify all the given instances.ID3 has been
used successfully for various classiÞcation and prediction applications.

Neural Networks:The neural networks (or connectionist paradigm) has also attracted a sig-
niÞcant attention in the learning community.In symbolic machine learning,knowledge
is represented in the form of symbolic descriptions of concepts.In connectionist learning
knowledge is represented by one or more layers of (artiÞcial) network of interconnected
nodes.Learning algorithms are applied to adjust connection weights so that the network
can predict or classify previously unseen examples (of the same problem) correctly.There
are many sorts of artiÞcial neural networks (ANNs).Some of the more popular ones include
multi-layer perceptron,learning vector quantization,radial basis function,HopÞeld and Ko-
honen networks (Ripley,1995).Sometimes ANNs are called feed-forward networks since
data can only be processed from input nodes to the output.When nodes implement some
sort of feed back to previous nodes (or themselves),these are called recurrent networks.An-
other way of looking at ANNs is in the way training regimes are employed.In some cases,
the output of the training data is presented to the ANN during training in which case ANN
employs a supervised learning regime.In other cases,the output of the training data is not
available to the ANN.The learning regimes in these cases are referred to as unsupervised
Chapter 2.Learning and Generalisation 13
or self-organizing.Unsupervised algorithms essentially perform clustering of the data into
similar groups based on some input features.
Among the number of artiÞcial neural network algorithms,back-propagation has been ex-
tremely popular.Back-propagation (Rumelhart,Hinton,& Williams,1986b) works on a
fully connected,layered,feed-forward network.Activations spread from the input layer to
the output layer (possibly by passing through a hidden layer).Back-propagation typically
starts with assigning a random set of weights to each connection in the network.The algo-
rithmadjusts weights according to input instances in a supervised learning framework.Each
input instance is processed at two stages:a forward pass and a backward pass.The back-
propagation algorithm updates weights incrementally until the network stabilizes which is
when learning is expected to occur.This network may then be tested on some unseen ex-
amples of the problem.

Genetic Algorithms (GAs):This refers to a class of algorithms which rely on analogies
to natural processes and Darwinian survival of the Þttest.It is based on the principles of
natural evolution where a population of individuals (string representations known as geno-
types of potential solutions which are known as phenotype) undergoes a sequence of some
genetic operators (i.e.mutation where some random changes to these individuals are in-
troduced and crossover where two chosen individuals called parents are allowed to create
off-springs by exchanging portions of the parents,in its simplest form,froma randompoint)
(Holland,1975).These individuals compete for survival:selecting Þtter individuals to un-
dergo genetic operators is a way of producing individuals for the next generation.After
some number of generations where the Þtter and Þtter individuals are allowed to survive,
the members of the population,hopefully,get closer to the target solution Ñ possibly,the
best individual represents the target solution.In this respect,GAs are mostly used within
the framework of reinforcement learning (Barto & Sutton,1982).A well-known learning
algorithm in this class is probably the ClassiÞer System (Goldberg,1989) which represents
knowledge in terms of parallel,interconnected production rules.A detailed account of GAs
and classiÞer systems are presented in chapter 4.Another line of research in this domain
is genetic programming (Koza,1992) which uses variable length representation of simple
ÔprogramÕ instructions aiming to solve variety of learning problems.In Chapters 5 and 6,I
will present a more detailed overview of the learning algorithms in this class.
The adequacy of the techniques of symbolic,connectionist or genetic-based learning is some-
what related to the assumptions of simplicity of problemdomains.if a set of simple,easy-to-solve
problems are chosen,most of these methods may seem to be adequate in Þnding solutions to
many learning problems.However,attempts to improve learning methods in solving more prob-
lems from difÞcult domains can produce more progress in the learning research.For this rea-
son,dealing with difÞcult problem domains is necessary to develop a better understanding of the
challenges for the research in artiÞcial learning systems.As introduced in the previous chapter
relational problems are one sort of such challenges for learning research.
2.5 Computational Learning
Some learning researchers called Ôcomputational learning theoristsÕ concentrate on explaining
general principles of artiÞcial learning based on theoretical analysis and mathematics.They aim
to establish sound theoretical foundations about the nature of a class of learning problems and
algorithms which are designed to solve them.
Chapter 2.Learning and Generalisation 14
One of the important Þndings of the computational learning research is Ôprobably approx-
imately correct (PAC) learningÕ (Valiant,1984).It provides a learning framework based on a
mathematical analysis of the process (readers who are interested in mathematical and theoretical
aspects of learning are suggested to refer to (Kearns &Vazirani,1994)).PAClearning is normally
introduced for binary classiÞcation tasks but can be extended to real valued and noisy problem
settings (Kearns & Vazirani,1994).The problem setting for PAC learning is presented in the fol-
lowing section in order to establish a ground for the learning scenario and generalisation which
are dealt with in this thesis.
2.5.1 PAC Learning
In PAC learning a learnerÕs task is to learn some target concepts (or functions) given a set of
instances (examples) of such concepts.The instances are provided in the form of input-output
mappings.Let X refer to a set of all possible instances over which some target concepts maybe
deÞned.For example X may refer to the set of all people described by sex (male/female) and
employment status (employed/unemployed).Let C refer to a set of target concepts to be learned
by a learner L.Each target concept c in Ccorresponds to a sub-set of X.For example,one concept
c in C may refer to ÔhousewivesÕ.PAC learning theory assumes that a set of instances from X is
generated at random according to an unknown distribution D.Each of these instances is labeled
as positive or negative examples of target concept c.For example,if x,randomly chosen from X,
is a true example of the target concept c,then c
￿
x
￿ ￿
1 but if x is not an example of the concept
c then c
￿
x
￿ ￿
0.In this manner,a set of x is picked from X at random according to a Þxed
but unknown distribution D.Then these examples and their target outputs ( c
￿
x
￿
) are presented to
the learner.The learner considers a set,H,of possible hypotheses when attempting to learn the
target concept.After observing those random examples of the target concept,the learner L must
output a hypothesis h in H as an estimate of c.The success of the learner is evaluated by the
performance of h on some new instances drawn randomly from X according to the distribution
D which is the same distribution as the one used for generating the training instances.The new
instances are presented to the learner without their target values and the learner L is expected to
guess these target values using the hypothesis h.We are interested in how closely the learnerÕs
output hypothesis h approximates the actual concept c.The probability that h will mis-classify
new instances in the testing process is called true error of hypothesis.In general,the goal of
the learning is to reduce this error and obtain good generalization.This depends on two factors.
First,given a set of training instances,there is more than one hypothesis which can Þt (explain) the
training instances.It is not always very likely that the learner will output the right hypothesis which
results in minimum true error.Second,it depends on the degree of the representativeness of the
randomly drawn training instances of target concept.Since,under normal conditions,providing a
learner with a perfect set of training instances which would minimize the true error is unrealistic,
the expectation from the learner needs to be relaxed.Thus,under PAC framework the learner
is expected to probably learn a hypothesis (during the training) which approximates the target
concept (during the testing).For this reason the framework is called Probably Approximately
Correct learning.
It should be noted that the PACframework contains an implicit assumption that the hypothesis
Chapter 2.Learning and Generalisation 15
space Hcontains an approximate hypothesis for every target concept in C.It is commonly accepted
that this is difÞcult to assure if one does not know C in advance and H is not for sure to be able
to express X.If this assumption does not hold it would be unfair to expect any learning method to
support accurate generalisation froma reasonable number of training examples.
PAC learning framework is mainly used to analyse the difÞculty of a class of learning prob-
lems.It is often used to Þnd an answer to an important question:How large should the training
data be for a class of learning problemso that it is PAClearnable?(see next section on VC dimen-
sion).However,the thesis interest in PAClearning is mainly to borrow a formal basis in analysing
the issue of generalisation.PAC framework suggests that a learner should Þrst be trained and then
its performance should be evaluated based on its success in testing over unseen instances.It also
suggests that testing instances should be generated in the same way as the training instances are
generated.So,during the course of the thesis the concepts of training,testing and generalisa-
tion used are those developed within the framework of PAC learning.Through out the thesis,
the terms Ôunseen cases or examplesÕ or Ôtest casesÕ will be referring to those generated under the
PAC assumptions unless otherwise stated.
In this respect,the performance measure is the ability of the learner to generalise:the predic-
tive accuracy of the learner in mapping unseen input cases to outputs with a satisfactory degree of
correctness.In general,the degree of success in the testing phase is considered to be satisfactory
when sufÞciently proportional unseen inputs are classiÞed correctly.Thus,a change or improve-
ment in the performance of the learner based on a learning process where the learner is provided
with a complete instance space (I will call this exhaustive encounter) and does not use training
and testing data or phases does not count as interesting or valuable learning in the context of this
thesis.What is actually involved in such cases is a kind of rote learning:i.e,the derivation of a
memorisation or a compressed (or compact) representation of the instance space (i.e.,in the worst
case it is a look-up table).These concepts are clearly deÞned and explained later in this chapter.
2.5.2 Vapnik-Chernovenkis (VC) Dimension
In the previous section the framework for PAC learning was presented.The framework provides
useful insights regarding the difÞculty of various class of learning problems and regarding rate of
improvement in generalisation performance with the increases in the training examples required
by the learning methods.The concept of this rate of improvement required by the learners is often
named as the sample complexity (SC) of the learning problem.This is an important issue because
in most practical situations the success of a learner may be limited by the availability of data.
PAC learning normally imply that SC can be computed in terms of the size of the hypothe-
sis space H (for the mathematics of this computation please refer to (Kearns & Vazirani,1994).
However,SC increases with the size of H and in some cases the computing size of the training
examples using this strategy may lead to weak results.For example,when H contains an inÞnite
number of hypotheses,computation of SC in this way is impossible (Mitchell,1997).
Another way of approaching computation of the sample complexity is presented under Vapnik-
Chernovenkis (VC) dimension (Kearns &Vazirani,1994).VC dimension looks at the hypothesis
space not in terms of the number of distinct hypothesis that it contains but rather in terms of
those instances from X that can be completely separated using H.Thus,the VC dimension of a
Chapter 2.Learning and Generalisation 16
hypothesis space HdeÞned over Xis the size of the largest Þnite subset of Xthat can be expressed
by H (for the mathematical expression of the VC dimensions see (Kearns & Vazirani,1994)).
This deÞnition presents a measure of capacity of H to represent target concepts deÞned over an
instance space X.This measure also helps in computing what size of a set of randomly drawn
training examples should be sufÞcient to PAC-learn any concept in C.Using VC lower and upper
values of the size of the training sets can be computed for any desired level of accuracy.This is
important to imply that if training examples are not sufÞcient no learning method can PAC-learn
every target concept in any non-trivial C.
The thesis interest in the VC dimension is to promote the awareness of the issues developed in
conventional learning methods.This can help us to Þnd the ways of borrowing this methodology
in order to improve on explorations in evolutionary generalisation.These issues raised here will
be re-visited when we discuss generalisation and relational learning problems.In practice,how-
ever,the use of the VC dimension in the thesis is very limited.One of the reasons for this limited
use is that the thesis is mostly focused on the empirical explorations of evolutionary generalisation
rather than developing theories about it.Nevertheless,these empirical approaches in this thesis
will be beneÞting from these theories in order to be methodologically sound.The second reason
is related to the fact that the problems of experimentation are chosen from benchmark learning
problems of the literature and as such many assumptions about these problems are already known.
Also,for comparison reasons the selection of training and testing cases for Monk problems will
follow the original research done previously (Thrun et al.,1991).
2.5.3 Cross Validation
Cross-validation is a method of estimating degree of generalization based on Ôre-samplingÕ.What
is involved here is that the data set deÞning a problemis divided into several sub-sets.The learner,
then,is trained according to a number of different data sets but each time some of the data set
is omitted in order to measure the performance of the learner.The technique is widely used in
the artiÞcial neural net community.The measures,such as generalization error,are often used to
choose which network structure will be good for a problem at hand.
There are many ways of conducting cross-validation.For example,in k-fold cross-validation,
the data is divided into k subsets of more or less equal size.The learner then is trained k times
but each time leaving out one of the subsets from training,and only the omitted subset is used to
compute performance of the learner.For example,when k equals the sample size,this is called
Ôleave-one-outÕ cross-validation.
It should be noted that cross-validation is different from ÔsplittingÕ the data set into separate
sub-sets (this is also called Ôhold-outÕ method that is commonly used for early stopping when
the learner is an artiÞcial neural network).In the Ôhold-outÕ method,only a single subset called
a validation set is used to estimate the performance of the learner rather than using a number
of different subsets.In the thesis,especially when developing a GP/back-prop hybrid the hold-
out strategy is used.Hold-out method is computationally less expensive and is chosen for this
pragmatic reason,since the use of GP in the hybrid model is already computationally expensive.
Chapter 2.Learning and Generalisation 17
2.6 A Framework for Learning
A dictionary deÞnition of learning corresponds to a range of statements from a simple memo-
risation to more complicated mental activities such as gaining knowledge or comprehension,or
mastery through experience or study.In this thesis an artiÞcial learning process is dealt with.
Quite a number of deÞnitions of artiÞcial learning can be found in AI literature each of which
emphasise different aspects of learning.As previously stated,these deÞnitions usually refer to
issues such as knowledge construction,skill acquisition or problem solving.Whatever the ref-
erence is,most of the deÞnitions accept learning as a change in the state of the learner or an
improved performance due to learning.The variety of the deÞnitions reßects the differences in the
way that different AI paradigms treat learning.A close look at the learning research will reveal
that researchers using genetic-based methods view learning somehow differently from symbolic
or connectionist paradigms.Some of the differences stemfromthe differences in the nature of the
paradigms but some others seems to be the result of usual practices over a period of time and lack
of a good interaction between the paradigms.The differences do not mean any of themhas a right
or wrong view of learning.However,they lead to artiÞcial learning systems with different goals
and performances.The experimental studies in this thesis to solve some learning problems using
genetic based methods makes it possible to understand the differences in the views,practices and
requirements of learning imposed by these paradigms.Thus,a framework for learning is needed
to establish a common ground in developing artiÞcial learning systems using any paradigm of
AI.In the following paragraphs elements of an artiÞcial learning framework is presented with an
emphasis on suggestion for building genetic-based learning systems:

A learner who is doing the learning.This is typically a human designed,simple model
of cognition based on a symbolic,connectionist or genetic paradigm.Although,it is not
very difÞcult to identify the learner in symbolic or connectionist paradigms,identifying the
learner in genetic based methods may be difÞcult.In symbolic and connectionist methods
there is a learning mechanism (such as ID3 or back-prop) which operates on some sort of
structure (such as a decision tree or artiÞcial neural network).During the learning process
there is no change in the way the mechanism operates but a change occurs on the structure.
For example,after applying the same ID3 algorithm many different types of trees can be
generated depending on the problem at hand.Similarly,the same back-prop algorithm can
be applied to different artiÞcial neural networks resulting in different connection strengths.
One would be tempted to call either the tree structure or the neural network Ôthe learnerÕ
since the actual change occurs in them.However,after the operation of the learning mech-
anisms has stopped,the structure will not be able to learn any further.Thus,it seems more
appropriate to call both the mechanism and structure the learner.In genetic based methods
evolution operates on a population of structures.However,though the mechanisms of apply-
ing crossover or mutation operators might be somewhat more ßexible and varying than the
mechanisms mentioned above,the aim of the mechanism is to cause changes in the mem-
bers of the population.Learning occurs as a result of changes in the population through the
evolutionary time.The structure which solves the problem the best is chosen from among
the population members in the last generation.It is important to note that this best individual
may not have a complete and clear history of the learning,since it is very likely that it has
just appeared on the last generation.In connectionist or symbolic paradigms one particular
structure changes during the process of the learning.This can create a confusion because
learning normally happens during the life time of an individual.However,since the struc-
tures in any generation are created by changing the structures in the previous generation by
Chapter 2.Learning and Generalisation 18
genetic operators,in a way,it can be argued that an individual has at least one generation
history of the learning.Nevertheless,similar to other paradigms the learner in genetic based
approach can be viewed as both the mechanisms (i.e.,evolution) and the structure.
It can be argued that a population of the potential solutions which undergo evolution are
hypotheses for the target solution.In this respect GAs can be considered as a learning
mechanism which evolve a population of hypothesis for a target solution to a particular
learning problem.Evolution aids in choosing the better hypothesis as an approximation of
the target solution.

A task or domain which is being learned.This is the problemto be solved.It could be either
deÞned by data or a task to be performed in an environment.The form of the deÞnition of
the problem together with the performance evaluation of the learner can have an inßuence
on the nature of the learning process.This will be explained in more detail when explaining
performance evaluation below.

Information source.This can be a teacher providing examples or answers to the queries of
the learner.In supervised learning,for example,the information source provides examples
of the inputs and the correct outputs during the training period.In most of the genetic based
learning implementations the information source is the environment to be explored and the
correct outputs may not be readily available.

Bias or prior knowledge.This is what is initially known and restricts what can be learned
about the domain.It determines the learnerÕs degree of uncertainty,bias or expectations
about the domain and can be useful in deÞning the space of possible solutions and affect the
degree of easiness in reaching a solution.Bias can be deÞned as
any basis for choosing one generalisation over another,other than strict con-
sistency with the instances (Mitchell,1980)
According to this deÞnition,two different forms of bias can identiÞed:absolute and relative
bias.Absolute bias refers to the assumption by the learning mechanism that the target
function to be learned is known to be a member of a particular class of functions.Arelative
bias,however,refers to the assumption that the target function is more likely to be a member
of a particular class of functions than of another.It is argued that every inductive algorithm
must adapt a bias in order to generalise beyond the training data.Without a bias all possible
functions must be seen as hypotheses.Since all the functions are part of the hypothesis
space,all future outcomes will be equally likely.So,they clearly prevent establishing a
basis for generalisation (Dietterich &Kong,1995).
In (des Jardins & Gordon,1995) bias is broadly redeÞned to include any factor that in-
ßuences the deÞnition or selection of the hypothesis.The process of learning is viewed as
forming a hypothesis that goes beyond the observed training instances.For generalisation to
occur,the learner must have some preference over the other hypotheses during the hypoth-
esis formation.Two major types of bias can be identiÞed:representational or procedural.
Representational bias refers to the states in a search space described by the representation
language of the hypotheses.For example,choosing a representation for a possible solution
(such as disjunctive normal form) or a particular implementation style (rules or decision
tree) determines a representational bias.Procedural bias refers to the mechanismof howthe
states of space (deÞned by representational language) will be searched.Examples of proce-
dural bias include preference of a learning algorithm in choosing a simpler or more speciÞc
hypothesis.Both types of bias may either interact and help each other or conßict (Cobb &
Bock,1994).Representational bias can be described in terms of strength and correctness.
A stronger bias corresponds to a smaller space of representation language hypotheses.In
Chapter 2.Learning and Generalisation 19
this respect more concise languages are stronger.If a representational bias is correct then
learning is possible.Otherwise,learning is not possible.A correct bias can encode for the
target solution.
Understanding the relationship between the bias and generalisation ability of genetic based
methods can be very important to build better genetic based learners which focus on gener-
alisation.In genetic based methods the procedural bias is determined by genetic operators
and may even include the Þtness function.For example,in a typical GA application allow-
ing the Þtness function to favour simpler expressions will restrict possible solutions to being
simple (what ever that means in a given context).The analysis of procedural bias of a GA
seems very complicated because it depends on several factors such as the way selection,
crossover or mutation works and the Þtness function.At the moment,it sufÞces to say that
for a successful generalisation,the procedural bias of a GA should reßect an appropriate
operation on the representation language chosen.The problem of generalization also de-
pends on the problemof Þnding a correct and strong representational bias.Representational
bias in genetic based methods refers to the way genotypes are represented;the size of the
alphabet to encode genotype,length of genotype,whether the length is Þxed or not,etc..In
this respect choice of encoding method (i.e.,representation language) is critical for a better
genetic based learners.
Solving relational learning problems closely depends on a representation language which
should be ßexible enough to be manipulated by evolutionary mechanism such that the rep-
resentation language makes it possible to forma higher order representation (or a re-coding).

Performance evaluation A performance criterion should be designated to see whether or
how well the learner has learned.There can be several different types of performance crite-
ria depending on the goal of learning.The goal of learning might be efÞciency,simplicity,
consistency,accuracy,and so on.Goals such as efÞciency or simplicity are self explanatory.
The goal of learning is to perform a particular task more efÞciently or to produce a sim-
pler solution required for a particular reason.The advantage of simplicity might be human
readability or avoiding over-Þtting.
Consistency refers to how well a learner learns the information presented to it.The aim of
the learner is to Þnd a rule of mapping that is perfectly consistent with the training data.
The earlier belief was that consistency would indirectly achieve a high degree of predictive
accuracy when tested on unseen data.However,later research (Quinlan,1986;Dietterich
& Kong,1995) involving noisy or sparse data,or more complex problems suggested that a
goal of consistency might result in over-Þtting the training data.In some cases the learnerÕs
performance is not evaluated on any unseen information fromthe same domain.The learner
is expected to learn only whatever information is presented to it.Most of the experiments
in genetic based domains and problem solving tasks can be considered as focusing on this
type of goal of consistency.In other words,in genetic based paradigms,viewing learning as
performance achievement or problem solving through a continuous interaction with the en-
vironment and self organising seems to be in accordance with such a goal.Thus,GA based
systems may be considered as giving more importance to the earlier belief that consistency
is what is required froma learner.
Accuracy refers to the performance of the learner on unseen data after presenting some ex-
amples froma particular domain.The learner is Þrst trained and then evaluated based on the
degree of its correct prediction of the output given an example input which is not presented
in the training period.If the performance of the learner is found to be satisfactory in this
testing process,the learner is said to have a good predictive accuracy or generalisation.The
issue of generalisation as performance evaluation of a learner has already gained acceptance
among machine learning researchers.However,genetic based researchers do not tend to
Chapter 2.Learning and Generalisation 20
evaluate the performance of the learner in terms of generalisation.Such performance evalu-
ation limits the abilities of genetic based learners to be applied to learning situations where
achieving consistency is considered to be enough.The thesis suggest that such learners do
not generalise but perform compression based learning (see below).In this thesis,I have
shown that performance evaluation of genetic based learners using generalisation is possible
and,even,necessary to improve what can be achieved in artiÞcial learning research.
Above,I have presented Þve elements of learning systems which are appropriate for any learn-
ing paradigm and form a common basis for creating artiÞcial learning systems using genetic or
non-genetic based paradigms of AI.Another source of confusion about learning might be the lack
of clear distinctions between different types of learning.In fact,they only differ according to what
is expected froma learner.When designing a learning systemcare must be taken in deciding what
the goal of the systemis.Although a learning systemmight performone or more goals at the same
time,a clear distinction among them should be made.In the following section such a distinction
viewed as the performance expectations froma learning system is explained.
2.7 The Issue of Generalisation