The author has granted a non - Bibliothèque et Archives Canada

colossalbangIA et Robotique

7 nov. 2013 (il y a 7 années et 11 mois)

535 vue(s)

National
Libraty
1*1
ofCanada
Bibliothèque nationale
du
Canada
Acquisitions
and
Acquisitions et
Bibliographic
Services
services bibliographiques
395
Wellington
Street
395.
rue Wellington
Ottawa
ON
K1A
ON4
Ottawa
ON
KI
A
ON4
Canada
Canada
The
author
has
granted
a
non-
exclusive
licence
allowing
the
National
Library
of
Canada
to
reproduce,
loan,
distriiute
or
sell
copies of this
thesis
in
microfom,
paper
or
electronic
formats.
The
author
retains
ownership
of the
copyright
in
this
thesis.
Neither
the
thesis
nor
substantial
extracts
f bm
it
may
be
printed
or
otherwise
reproduced
without
the
author's
permission.
L'auteur
a
accordé
une
licence
non
exclusive
permettant
a
la
Bibliothèque
nationale
du
Canada
de
reproduire,
prêter,
distribuer
ou
vendre
des copies
de
cette
thèse
sous
la
forme
de
microfiche/^,
de
reproduction
sur
papier ou
sur
format
électronique.
L'auteur
conserve
la
propriété
du
droit
d'
auteur
qui
protège
cette
thèse.
Ni
la
thèse
ni
des extraits
substantiels
de
celle-ci
ne doivent
être
imprimés
ou
autrement
reproduits
sans
son
autorisation.
Daniel
L.
Silver
Graduate
Program
~n
Corn
puter
Science
Submitted
in partial
fulfillment
of
the requirements
for
the degree
of
Doctor
of
Philosophy
Faculty
of
Graduate
Studies
The
University
of
Western Ontario
London, Ontario
J une 2000
@
Daniel
L.
Silver
2000
Within the context of artificial
neuraI
networks
(ANN)
,
we explore the question:
Honr
can
a
tearning
system
retain and use previously learned knowledge to facilitate future learn-
ing?
The research objectives are
to
develop a
t
heoretical
mode1
and test a prototype
system
which sequentially retains
,4NN
task knowledge and
se!ectively
uses that knowledge to bias
the learning of a new task in an efficient and effective
rnanner.
A
t heory of
selective functional
transfer
is
presented
that requires a learning
algorit
hm
t
hat employs a
rneasure
of task relatedness.
qMTL
is introduced
as
a knowledge based in-
ductive learning method that learns one or more secondary tasks within
a
back-propagation
ANN
as
a
source of inductive bias for
a
primary
task.
qMTL
employs a
separate
learning
rate,
qk7
for each secondary task output
k.
varies
as
a function of a
measure
of
related-
ness,
Ri'
between the
kt h
secondary
task
and the primary task
of
interest. Three categories
of
a
priori measures of relatedness are developed for controlling inductive
bias.
The
task
rehearsal
method
(Tml)
is introduced to address the issue of sequential
reten-
tion and generation of Iearned
task
knowledge, The
representations
of
successfully
learned
tasks are stored within a
domain
knowledge
repository.
Virtual
training
examples
generated
from dornain knowledge are rehearsed
as
secondary tasks in
parallel
with each new
task
usi ng
either standard multiple
task
Iearning
(MTL)
or
qMTL.
TRM
using
qMTL
is tested as a method of selective knowledge transfer and sequential
learning on two synthetic
domains
and one
medical
diagnostic
domain.
Esperiments
show
that the
TR-bi
provides
an
esceHent
method of
retaining
and generating
accurate
functional
task knowledge. Hypotheses generated are compared statistically to single
task
learning
and
MTL
hypotheses. We
conclude
that selective knowledge transfer with
qMTL
develops
more effective hypotheses but not necessarily with greater efficiency. The a
priori
mesures
of relatedness demonstrate significant value on certain
domains
of tasks but have
difficulty
scaling to large numbers of tasks.
Several
issues identified
during
the research indicate the
importance of consolidating a representational form of
domain
knowledge.
Keywords:
task
knowledge transfer, artificial neural
networks,
sequential Iearning. induc-
tive
bias,
task relatedness, knowledge based inductive learning, learning
to
learn, knowledge
consolidation
To
my
friends
and
fnrnily.
especialiy
to
m y
wife?
Geri,
and
mother
Dora
for
the+
never
ending
sovrce
of loue, support and
enco-icragernent,
To
rny
parents, Dora and
Charlie.
for
their
gift
of
energy,
curiosity,
and detemination.
TO
my
children
Natalie
and
.kIonique
for
helping
me
learn
how
to
leam
new
things-
The successful completion of this dissertation is due in great part t o t he support. encourage-
ment? and understanding of my supervisor, Robert
E.
Mercer-
Bob recognized
rny
interest
in machine iearning, created a research environment and provided intellectual, financial and
persona1
support t hat
never
wavered
during
o r aft er my internship a t
CWO,
The other members of my
Ph.D.
cornmittee,
Ken
UcRae
and Trever Cradduck, pro-
vided very
helpful
information and
financiaI
support
during
t he
early
stages of t he research
effort.
1
would like t o thank Ken
McRae
for providing important background information
on cognitive science and analogical reasoning, research directions in t he st udy
of
ANXs
and
psychology,
and references t o articles on neurophysioiogy.
I
would like t o acknowledge t he
significant
contribution made
by
Trever Cradduck and t he Department of
Nuciear
Medicine.
Victoria Campus, of t he London Health Science Centre.
4s
part of an on-going effort into
methods of
aut omat ed
medicaI
diagnosis, t he depart ment provided
me
with an office, corn-
pu ting and communication equipment, and t he financial support necessary for t he
program.
The Department of Cornputer Science University of Western Ontario provided additional
financial support with travel funding from the Faculty of
Graduat e
Studies,
Over
the course of my research
I
have had numerous discussions via telephone and
email
as
well
as
in
person
with other researchers.
Many
have
graciously
provided feedback on ear-
lier articles and chapters of this dissertation. In particular,
I
would like t o thank:
Yathalie
.Japkowicz,
-4ndre
Trudel,
Anthony Robins, Sebastian Thrun.
Lorien
Pr at t, Jonat han Bas-
ter,
Rich
Caruana, Simon Haykin, Charles Ling, Gilbert
Hurwitz,
and Piotr Slomka.
I
would also like t o thank the staff of t he Department of Computer Science,
UWO
for their assistance and friendship through
my
graduat e
years. David Wiseman kept t he
machines
running
and
soIved
several
long-distance connection problems.
Ursula
Dut z
and
Sandra
McKay
were of great assistance
during
my st ay in London. Janice
Wiersma
pro-
vided
much
help and support after my departure from t he
UWO
campus
by
assuring
t hat
administrative issues were dealt with quickly and professionally.
TABLE.
OF
CONTENTS
CERTIFICATE OF EXAMINATION
.
.
11
. .
.
ABSTRACT
111
LIST OF
T-4BLES
xi
.
. .
LIST OF FIGURES
xlll
Chapter
1
INTRODUCTION 1
1.1
Overview
of
Problem
.
.
.
.
.
.
.
. .
.
. . . .
.
.
.
.
-
.
.
.
. .
. .
.
. .
-
-
2
1.2
Research
Objectives
.
.
.
.
. .
. .
. . . .
.
.
. .
. . .
.
.
.
.
. . . . . . .
-
2
1.3
Motivation
.
-
-
.
. . . .
. .
. .
-
. . . .
. . .
.
-
-
-
. .
.
.
. . .
.
. .
.
.
3
1.4
Research
Approach
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
-
-
. .
. . . . . . . .
-
.
6
-
1.5
Overview of the Dissertation
.
.
.
.
.
.
.
.
.
.
.
-
-
-
. .
.
. .
-
. .
. .
.
-
I
Chapter
2
BACKGROUND
AND PROBLEM FORMULATION
9
2.1
Background on Inductive Learning and
ANNs
.
.
. .
-
.
.
. . . .
.
. .
. .
.
9
2.1.1
The
Framework
of Inductive Learning
.
.
.
.
.
.
.
. . . .
.
. . .
.
.
9
2.1.2
Inductive
Bias
and Prior
Knowledge
. . . .
. .
.
.
.
.
. . .
.
. .
. .
12
2.1.3
Knowledge
Based
Inductive Learning
.
. .
. .
-
.
. . .
. .
.
. .
. .
.
1.5
2.1.4
Learning
t o
Learn
Theory
.
. .
.
-
. . .
-
.
.
-
.
-
. . . .
. .
. .
-
.
LS
2.1..5
Analogical
Reasoning
.
.
.
. .
. .
.
-
. .
. . . .
. .
.
. . .
-
. .
.
-
19
2.1
Artificial
Neural Networks
.
. .
.
. .
. .
. . .
-
. . .
. . .
. .
. .
.
-
20
2.2
Survey
of
Knowledge
Transfer in
ANNs
.
.
.
. .
.
. . .
. . .
.
.
. .
.
. . .
26
2.2.1
Fundamental
Problems of
Knowledge
Transfer in
ANNs
.
.
.
. .
.
.
26
......................
2.2.2
Sumrnary
of
Previous
Surveys
28
................
2.2.3
Representational
us
.
Functional Transfer
25
...............
2.2.4
Approaches to Representational Transfer
29
...................
2- 25
Approaches
t o
Functional Transfer
32
...............
2.2.6
Analogy
within
PLrtificial
NeuraI
Networks
39
............................
2.3
Major
Research
Questions
40
........
2.3.1
Knowledge
Retention: Representational or Functional?
11
...............
2.3.2
Knowledge
Consolidation:
How
is
it
done?
-!L
........
2.3.3
Knomledge
Transfer: Representational or Functional? 42
............
2.3.4
Task
Relatedness: What
is
it'?
Kow
is
it
used?
44
.............
2.3.5
Çequential
Learning:
1s
it possible in
ANNs?
4.5
.....................
2.4
Objectives and Scope of the Research
46
.................................
2.4.1
Objectives
46
...................................
2.4.2
Çcope
41
Chapter
3
FUNCTIONAL TRANSFER
and
SEQUENTIAL
LEARNING
49
...............
3.1
MTL
.
-4
Basis
for Selective Functional Transfer
50
...................
3.1.1
Review
of
MTL
Network Learning
50
........
.3.L.2
Characteristics
and
inherent
Biases
of
MTL
Networks
54
.-
......
t3.1.3
Inductive
Bias
=
Domain
Knowledge
+
Task
Relatedness
m
......
3.1.4
Inductive
Bias,
Interna1 Representation and Related
Tasks
59
3.1.5
Alternative Strategies for
MTL
Based
Task
Knowledge
Transfer
...
64
...................
3.2 A
Theory
of
Selective
Functional
Transfer
67
...............
3.3
A
Frarnework
for a
Measure
of
Task
Relatedness
6s
........
3.3.1
Hints
to a Framework for
Ernploÿing
Task
Relatedness
68
................
3- 32
From
Framework
to
Functional
Transfer 69
........................
3.4
The Nature of
Task
Relatedness
73
...............
3.4.1
Relatedness
espressed
as
a Distance
Metric
76
--
.........................
3.4.2
Relatedness as
Similarity
, ,
....................
3.4.3
Relatedness
as
Shared
Invariance
81
.................
3.4.4
Criteria
for a
Measure
of Relatedness
81
.....
3.4.5
An
Appropriate Test
Domain
for
a
Measure
of Relatedness
52
.......................
3.5
Measures
of Relatedness Esplored 83
..............................
3- 51
Static
Measures
83
vii
............................
3- 52
Dynamic
Measures
87
3- 53
Hybrid
Measures
.............................
99
..................
3.6
SequentiaI
Learning through
Task
Rehearsd
99
..............
3.6.1
Background on Rehearsal of
Task
Examples
100
.................
3.6.2
Mode1
for The
Task
Rehearsal
Method
LOO
........
3.6.3
An Appropriate Test
Domain
for Sequential Learning
105
.......................
3.7
The Functional Transfer Prot ot ype 106
............................
3- 71
The
ANN
Software
107
...........................
3.7
-2
The
TRM
Software- 109
Chapter
4
EXPERIMENTS WITH
qMTL
:
SYNTHETIC
DOMAINS
4.1
The Band
Domain
................................
..........
4.2 Experiments using t he
q&lTL
Framework
.
Band
Domain
4- 21
Esperiment
1:
Band
Domain
.
Inductive
Bias
provided by each 'Task
4.2.2
Esperiment
2:
Inductive
bias
from
combinations of
secondary
t asks
............
4- 23
Esperirnent
3:
The effect of varying
Rk
values
4.2.4
Summary
.................................
.........
4.3
Experiments using Measures of Relatedness
-
Band Dornain
...........
4.3.1
Esperiment
4:
Performance of
Various
Measures
..........
4.3.2
Experiment 5: Sensitivity t o dynamic
c
parameter
.................................
4.3.3
Summary
................................
4.4
The
Logic
Domain
4.5
Esperiments
using
7MTL
and Measures of Relatedness
.
Logic
Domain
.
.
.........
4.5.1
Esperiment
6:
inductive Bias provided
by
each
Task
...........
4.5.2
Esperiment
7:
Performance of
Various
Measures
.................................
4.5.3
Summary
......
4.5.4
Esperiment
8:
Sensitivity t o number
of
training
examples
..........
4.5.5
Experiment
9:
Sensitivity t o dynamic
c
parameter
.................................
4- 56
Summary
Chapter
5
EXPERIMENTS WITH
TRM
:
SYNTHETIC
DOMAINS
185
.......
5.1 Experiments using t he
Task
Rehearsal
Method
.
Band
Dornain
185
....................
1
.
1
The
lmpoverished
Band
Domain
186
.....
5.1.2
Esperiment
10:
STL
learning
of
impoverished
training sets
191
viii
9.1.3
Ensuring
t he Generation of
Accurate
Virtual
Examples
.......
192
1
Esperiment
11
.
Sequential learning of irnpoverished Band tasks
. .
194
5-15
-4ccuracy
and Value of
Virtual
Esamples
...............
210
.j.
1.6
Surnrnary
.................................
212
5.2
Experiments
using t he
Task
Rehearsal
Method
.
Logic
Domain
.......
214
5.2.1
The Impoverished Logic
Domain
....................
21.5
5 - 2 2
Esperiment
12:
STL Iearning of
impot-erïshed
training set s
.....
21.5
5.2.3
Ensuring
t he Generation of
Accurate
VirtuaI
Examples
.......
217
.j.
2.4
Esperiment
13
.
Sequential learning of impoverished Logic
tasks
. .
218
.j.2
..j
Accuracy and Value of
Virtual
Examples
...............
232
5.2.6
Sumrnary
.................................
233
Chapter
6
EXPERIMENTS
WITH
TRM
:
APPLIED
DOMAIN
235
....................
6.1
Medical
Diagnostic
Modelling
and
KBIL
233
...................
6.2 Coronary Artery Disease
(CAD)
Diagnosis 236
................................
6.3 The
C-AD
Domain
237
............
6.4
Esperiment
14:
inductive
Bias
provided
by
each
Task
241
...............
6
..5
Esperiment
15:
SequentiaI
learning
of
CAD
tasks
244
....................
6.6
Analysis of t he Diagnostic
Irnprovement
2.52
.....................................
6.7
Sumrnary
2.53
Chapter
7
DISCUSSION
256
....................
7.1
Discussion of
qMTL
FunctionaI
Transfer 256
-
.........
i
1.1
qMTL
as a
Frametvork
for
a
Measure
of Relatedness
257
........
7-1-12
Performance of t he proposed
Measures
of Relatedness 259
..........
7.1.3
Scalability of
qMTL
and the
rneasuïes
of relatedness 264
'7.1.4
Alternative
explanations
for the success of
qMTL
...........
271
...............................
7.1.5 Rel at edwork 272
........
2
Discussion of
Task
Rehearsal
Method
ofsequential
Learning 274
......
7.2.1
Performance of t he
TRM
as
a
sequential
learning system
274
............................
7.2.2
Scalability of
TR,M
219
7.2.3
Alternative explanations for the success of
TRM
...........
281
...............................
7.2.4
Related work
282
......................
7.3 Advanced Issues and Open Questions 283
7.3.1
CornpIexity
of
Task
Relatedness and Inductive
Bias
.........
283
-
i.3.2
The Need for Consolidated
Domain
Knowledge
............
284
7.3.3
The Need for Functional Transfer for
New
Learning
.........
286
7.4
SeIective
Functional
Sransfer
and
Other
Learning
Methods
.........
287
4
.
1
KNN
and
Knowledge
Transfer from Related
Tasks
..........
288
7- 42
SequentiaI
Learning
t
hrough Selective Functional Transfer
m-it
h
I<NN
289
Chapter 8
CONCLUSION
291
...................
8.1
Objectives and
Approach
of the Research 291
.......................
8.2
Major Findings and Contributions
294
..............................
8.2.1
General Issues
294
.........................
8.2.2
The
Prototype Software 296
.............................
8- 23
TheqMTLKBI L
296
......................
8.2.4
The
Measuresof
Relatedness
291
.......................
5.2
..5
The
Task
Rehearsal
.M
ethod 298
.........................
8.3 Suggestions for Future Research
299
REFERENCES
303
Appendix
A Glossary
of
Terms
and
Acronyms
312
Appendix
13
Probably
ApproximateIy
Correct
Learning
31.5
Appendix
C Mathematical
Details
315
....
C.l
Mutual
Information of Secondary
Task
with
respect to
Prirnary
Task
318
.....
C.2
Mutual
Information of Hidden
Node
with respect t o
a
Task
Out put
320
VITA
LIST
OF TABLES
..
3.1
Mat ri s
of alternative strategies for
MTL
based
t ask
knodedge
transfer.
6.5
4.1
Srrmmary
of t he training, validation and test sets of
esamples
for each t ask
...............................
OF
the Band
domain.
113
4-2
Experiment
1:
The inductive
bias
provided
by
each secondary task of t he
..................................
Band
Domain.
118
4.3
Esperiment
2: The inductive
bias
provided by groups of secondary
tasks
of
................................
the Band
Domain.
126
4.4
Experiment
3:
The effect of
manually
varying
measures
OF
relatedness on t he
...................................
Band
domain.
131
4.5
Esperiment
4: The performance of
various
measures of relatedness
under
...........................
qMTL
on t he Band
domain.
138
4.6
Esperirnent
4:
Esamples
of
t he
%
values for each of
t he
secondary
taslis
of
the Band
domain
at
t he beginning of training and
at
t he point of minimum
..................................
validation
error.
139
4.1
Esperiment
5:
Sensitivity of hypothesis development t o t he
d- nami c
c
pa-
.....................................
ramet er..
1.5.5
4.8
Esperiment
4:
Ranking of the
various
measures of relatedness used on tlie
...................................
Band
domain-
1.58
.................
4.9 Description of t he
taslis
of t he Logic
Domain.
160
4.10
Summary of t he training, validation and test
examples
for t he
eight
tasks of
..............
the
Logic
domain
for one of t he experimental
riins.
161
4.11
Esperiment
7:
The performance of
various
measures of relatedness
under
...........................
qMTL
on t he Logic
domain.
168
LIST OF FIGURES
The basic
framework
for inductive learning.
. . . . .
.
. . . . .
.
.
.
. . . .
10
The framework for knowledge
based
inductive learning.
.
. . .
. .
. .
. .
.
16
An artificial neuron.
. . .
.
. . . .
. . .
. . . .
. .
.
.
-
-
. . . . . .
-
.
.
.
21
An
esampl e
of how t he
same
ANN
can
represent different functions depending
on t he value of t he connection
weights.
. .
. . .
. . . . .
. . . . . .
-
. . .
.
23
The multi-layer feed-forward network and t he back-propagation
algorithm.
.
24
PLn
ANN
generates
a
trajectory through weight space
as
i t Iearns
a
new
task.
2.5
A
Multiple
Task
Learning
( MTL)
network.
.
. .
.
.
.
. . . . .
.
. .
. .
. .
36
Prot ot ype of
a
simple
Mayer
rnuItiple
task
Iearning
( MTL)
network capabl e
of comput i ng any continuous function.
. . . .
. .
. .
. . . . . .
.
.
. . .
.
.
51
Prot ot ype of
a
complex
dl ayer
MTL network capable of computing
a n
arbi-
t rary function.
. .
. .
. .
. . . .
. .
-
. . . .
.
.
.
-
-
-
. . .
-
. .
.
-
.
-
.
52
Prot ot ype of
a
more complex 5-Iayer
-MTL
network.
.
. . . . .
. . .
. .
-
.
53
Training
error
versus
the number of
batch
iterations for
4
hypotheses
while
learning
a11
14
non-trivial logic functions
OF
2
variables within an
MTL
network.
56
An
idealized representation of
various
hypothesis spaces
iinder
STL
a nd
MTL
networks and a n optimal hypothesis,
ho,
for t he primary task,
To.
. .
. . .
.
61
Mean
number of test set misclassifications by t he primary t ask
versus
t he
number of hidden
nodes
within
an
MTL
network.
. . .
. . . . . .
. .
.
.
. .
63
Percent misclassifications
by
primary task hypotheses
versus
variation in
RI,
for
al1
secondary tasks
.
. .
-
. .
.
. . . .
.
. .
. . .
. . . . . .
.
.
.
. .
-
.
14
A
function space
shosving
the
proxirnity of
a
primary
t ask
To
t o secondary
tasks
Tl
and
Tk..
. . . .
. .
.
-
.
.
-
-
-
.
.
. .
-
. .
. . . . . .
. . .
. .
-
.
76
Three functions created
from
t h e composition of four Fourier
series
components.
79
3.10
Three hypotheses created from t he composition of four
MTL
hidden
nodes
.......................................
features
3.11
The 3-dimensional surface of
Rk
=
tanh(2.6.5reik)
with training
error,
Ek.
set t o t he
mean
cross.entropy
...........................
3.12
The 3-dimensional surface of
Rk
=
tanh(2.65reIk)
but with training
error
.
...................................
Ek,
dampened
.....................
3.13
A
mode1
for the
Task
Rehearsal Method
4.1 The Band
domain
shown
within
i t s
%variable
input space
...........
4 2
The training, validation and test
example
sets for the
primary
t ask
of t he
............
Band
domain
depicted within t he 2-variable input space
4.3
Experiment
1:
The inductive
bias
provided
by
each secondary
ta&
of t he
Band
Domain
...................................
................
4.4
Esperiment
1:
STL
results on t he Band
domain
................
4 5
Experiment 1:
MTL
results
on
t he Band
domain
4.6
Experiment
2: The inductive
bias
provided
by
groups of secondary
tasks
of
.................................
the Band
Domain
4.7
Esperiment
2: Learning
under
qMTL
with
Ro
=
R5
=
R6
=
1.0
whiie
....................................
Rl-4
=
0.0
4.8
Esperiment
2:
Learning
under
gMTL
with
Ro
=
Rz
=
R5
=
1.0
while
al1
ot hers are 0.0.
...................................
4.9
Esperiment
3:
The
effect of rnanually
varying
measures
of relatedness on t he
...................................
Band
domain.
...
4.10
Experirnent 4: The learning effectiveness
of
qMTL
on t he Band
domain
4.11 Experirnent
3:
The learning efficiency of
qMTL
on the Band
domain
.....
........
4.12
Esperiment
4:
qMTL,
with
RI,
based
on the static
(rl
measure
.......
4.13
Esperiment
4:
qMTL,
with
Rk
based
on the static
MI
measure
.....
4.14 Esperiment 4:
qMTL
.
with
Rk
based on the dynamic cos
O
measure
.....
4-15
Experiment 4:
qMTL?
with
Rk
based on the dynamic
cos d
measure
...
4.16
Experiment
4:
qMTL,
with
Rk
based on the hybrid
Ir1
+
cos
0
me s u r e
...
4-17
Esperiment
4:
qMTL,
with
Rk
based
on the hybrid
Ir1
+
cos d
measure
...
4.18 Experiment
4:
qMTL,
with
Rk
based
ON
the hÿbrid
MI
+
cos9
measure
4.19
Experiment 4:
qMTL,
with
Rk
based
on the hybrid
M l
+
cos 4
measure
.
.
4.20 Experiment
5:
Sensitivity of hypothesis development
to
t he dynamic
c
pa-
rameter.
.
.
-
. .
-
.
.
. . .
-
-
-
.
. .
.
.
.
.
. . . . . .
-
. .
.
. . . . .
. .
1.21
The
ideal
11-10-8 network configuration for
al1
tasks of t he Logic
domain.
4.22
Experiment 6: The inductive bias by each secondary task of t he Logic
Domain
using
a
11-6-8 network.
. .
-
.
. . . . . . . . . . . . . .
. . . .
. .
.
-
. . .
4.23
Esperiment
6:
The inductive
bias
by each secondary task of
t
he Logic
Domain
using
a
11-32-8
network-
,
.
. .
-
.
. .
.
. . .
. .
. .
.
.
.
. . .
.
.
.
.
. . .
4.24
Esperiment
7:
Learning effectiveness on the
Logic
domain
using
a
11-6-8
network-
. .
.
. . .
.
.
.
.
.
-
network-...........-................-.-.
. . . . .
-
. . . . .
-
. .
. .
-
.
. . . .
-
-
4.25
Esperiment
7:
Learning effectiveness on the Logic
domai n
using
a
11-32-8
network,
.
.
. . . .
.
. .
. .
-
.
. . . . .
-
.
.
. . . . . .
.
.
.
-
-
.
.
-
.
. .
4.26 Experiment 7: STL results on t he Logic
domain.
.
.
. .
.
. .
.
. . .
-
.
. .
4.27
Experiment 7:
MTL
results on t he Logic
domain.
. .
. .
.
.
. .
. . .
-
.
. .
4.2
Esperiment
7:
qMTL,
with
Rk
based
on t he dynamic
cos4
measure.
-
.
. .
4.29
Esperirnent
7:
qMTL,
with
Rk
based on t he static
Ir1
measure.
.
. . . .
. .
4.30 Experiment
7:
qMTL,
with
Rk
based on t he hybrid
[rl
+
cosd
measure.
. .
4.3
1
Experiment
8:
Sensitivity of hypothesis development t o number of training
esamples.
.
. . . .
.
. . . .
.
. . .
. .
-
.
.
.
. .
-
. .
. .
.
.
.
-
. . . .
-
.
1.32
Esperiment
9:
Sensitivity of hypothesis development t o t he dynamic
c
pa-
rameter.
. . .
. . .
. .
. . . .
. . .
. . .
. . . .
. . .
-
.
-
-
.
. .
. . . . .
-
5.1
impoverished training
esamples
for t he
7
tasks
of t he Band
domain
within
their 2-variable input
space-
.
. .
. . . . .
. .
. . . .
. . . . .
. .
. . . . .
-
190
.5.2
Esperiment
10: Learning effectiveness of hypotheses developed by
STL
for
al1
7 tasks of t he impoverished Band
domain.
.
. . . . .
. . .
. .
. . .
.
.
.
192
.5.3
Esperiment
10: Classification of
a
test set for t he (a) t ask and (b)
task
To
by
STL
hypotheses developed directly
from
t he impoverished training
examples.
. .
. .
. . . . . .
. . .
-
.
.
.
. .
.
. .
-
. .
.
. . . . . . . . .
-
193
5.4 Experiment
11:
Comparison of
TEKM
sequential learning
triaIs
on t he
impov-
eriched
Band
domain.
.
. . . . . .
. . .
. .
. .
. . . .
.
. . .
. .
. . . . .
.
199
5.5
Esperiment
11: Sequential Learning of
under
TRM
and
MTL.
.
.
. .
. .
204
.'5.6
Experiment
11:
Sequential
learning
of
T3
under
TRLd
and
qMTL
with
Rk
based on t he hybrid
Ir1
+
cos+
measure of relatedness.
. . . .
. . . .
.
.
. .
203
5.7
Esperiment
11:
Sequential Iearning of
To
under
TRM
and MTL.
.
.
.
. .
.
-5.8
Experiment
I l:
Sequential learning of
To
under
TRSL.1
and
qMTL
with
Rk
based on t he hybrid
Ir1
+
cosd
rneasure of relatedness.
, ,
. . . .
.
.
.
. .
.
5-9
Esperiment
11:
Classification of
a
test
set
by
STL,
MTL
and
T,IMTL
hy-
potheses for task
T3
of the impoverished Band
domain,
.
,
. . .
. .
. . . . .
5-10
Esperiment
11:
CIassification
of
a
test
set
by
STL,
MTL
and
qMTL
hy-
potheses for task
To
of the impoverished Band
domain.
.
-
. . .
. . .
. . .
-
5-11
An
example
of
t he ability
of
the
TRbI
to
generate
accurate
virtual
examples
from impoverished Band
domain
training
sets.
. .
. .
.
.
. .
. . . .
-
-
.
-
5-12
Relearning of
To
of
t he Band
domain,
under
t he
T RI
but
rvith
only
t he
10
training
esamples
t
hat
have
target values.
.
.
.
. . . .
.
. .
.
-
. . .
. . . .
5-13
A
comparison of hypotheses developed for
To
of t he Band
domain
from Es-
periments
4
and
11.
. . . .
.
.
.
. .
. . . . . . . . .
.
. .
.
-
. .
.
.
. . . .
5-14
Experiment
12:
STL
learning effectiveness for
al1
8
tasks of t he impoverished
Logic
domain.
.
. .
.
.
.
. .
.
. .
. . . . . .
. . . . .
.
. . .
.
. . . . .
. . .
.5-1.5
Experiment
13:
Bar graphs of learning effectiveness on t he impoverished
Logic
domain-
.
.
.
. . . .
. .
. .
. . . . . . .
. . . .
. .
.
. .
.
-
.
.
. . . .
5.16
Esperiment
13:
Sequential learning of
Tl
under
TRM
and
MTL.
.
.
. .
,
.
-5.17
Esperiment
13:
Sequeatial
learning of
Tl
under
TR,M
and
qMTL
with
fi k
based
on
t he hybrid
1
ri
+
cos4
measure of relatedness.
. .
. .
. .
.
.
.
-
. .
5-18
Experirnent
13:
Sequential
iearning
of
Tl
under
TPLM
and
7MTL
with
Rk
based on t he dynamic
cos4
measure of relatedness.
.
.
. .
.
. . . . .
.
,
. .
5-19
Esperiment
13:
Sequential learning of
To
under
TR&I
and
MTL.
.
.
.
.
. .
5-20
Experiment
13:
Sequential learning of
To
under
TRM
and
.qMTL
with
Rk
based
on
t he hy brid
1
rl
+
cos4
measure of relatedness.
.
. .
. . . . .
. . . .
5-21
A
comparison of hypotheses developed for
To
of t he Logic
domain
under
t he
TRh/I
from different numbers of virtual
esamples
per secondary task.
. . . .
6.1
Esperirnent
14.
Inductive bias provided by each task
of
t he
CAD
dornain t o
the
carnc
tasic.
. . . .
.
. .
.
. .
. .
.
.
. . . .
. . . .
.
. .
. . .
.
. .
. . .
6.2
Esperiment
1.5:
Performance results from
sequential
Iearning
on
the
CAD
domain
tasks.
. . .
.
. . .
. .
. .
. . .
.
. . .
. . . .
.
. .
. . . . . . .
. . .
svi
6.3
Experiment
15:
TRbf
sequential Iearning of t he
vamc
task
using
qMTL
and
......................
the
1
r
1
+
COS
4
rneasure
of
relatedness
231
6.4
Esperiment
15:
Detailed performance statistics for
vamc
hypotheses
devel-
oped
under
the
Tml
...............................
254
7.1
Learning effectiveness with
a
large number
of
unrelated
secondary
taslis
...
268
....
7.2
Learning effectiveness with a large number
of
reiated secondarp
t asl s
270
Chapter
1
INTRODUCTION
Over
t he
last
t wo
decades
of t h e
2oth
Centu-
a n i mport ant ar ea of research associated
rvith
Xrtificial
Intelligence has been concerned wi t h t he construction of cornput er syst ems t hat
improve wi t h experience. Th i s field of research
has
become known as Machine Learning
[Mitc97].
Much
progress
has
been achieved in machi ne learning
since
t he earl y 1980s. Th e most
noticeable a n d practical
resul t
is
t he
wide-spread application of machi ne learning soft ware
i n science, business a nd i ndust ry. Commercial product s t hat use machi ne learning soft ware
have helped cr eat e new t er ms
s uch
as
data
mi ni ng
and
intelligent agent s
a nd
encouraged
new
enterprises engaged in act i vi t i es
such
as aut omat ed
knowledge
disco.uery
and
user
profiling.
Subsequently.
machine
l earni ng
has
caught
t h e
imagination
of
a
nerv
generat i on of
Young
scientists and professionals as
well
as
corporat e
decision
makers.
Despite
t hese
impressive resul t s,
from
a n academic perspective, t her e is
still
much
for
machine learning t o
accornplish.
If,
ultimately, t he goal is t o
develop
a
machine t ha t is
capabl e of l earni ng a t t he
level
of
t he
human
mi nd, t he journey has onl y j ust begun. Thi s
thesis
takes one st ep in t ha t journey
by
expIoring
an out st andi ng question in machi ne
learning t hat
deserves
furt her
at t ent i on:
How
c an
a learning
system
ret ai n
learned
knowledge
and
use
that
knowledge t o
facilitate
future learning'?
A
bet t er underst andi ng of t hi s question
a nd t he t est i ng of possible sol ut i ons is a t t he hear t of this dissertation.
Thi s i nt roduct ory chapt er
is
divided into five sections. Section
1.1
provides
a n overview
of t he research problem. Sect i on
1.2
defines
t he general objectives of t he research. Sec-
t i on
1.3
presents t he mot i vat i on for t his dissertation. Section 1.4 describes t he approach
t
hat
was
t aken.
Finally,
Sect i an
1.5
provides
a n overview of t he subsequent chapt er s and
t he st ruct ure and
0ow
of
t he document.
Overview
of Problem
Th e
vast
majority of machine
learning
research
has
focused on t he
tabula
rasa
approach
of
inducing
a
model of a classification task from a set of süpervised training
esampies.
Consequently,
most machine
learning
systems d o not take advantage of previously acquired
t ask knowledge when Iearning a
new
and potentially
related
task. Unlike
human
learners.
these
systems ar e not capable of sequentially improving upon their ability t o learn tasks-
The ability t o learn sequences of tasks would
be
of
benefit from both
a
theoretical
and
practical perspective, Learning theory tells us t hat the development of
a
sufficiently
ac-
curat e
model from a practical number of
examples
depends
upon an appropriate
inductive
&as,
one source of which is knowledge t hat
can
be derived from t he models of
previously
learned tasks
[Mitc80].
We
wilI
refer t o this knowledge
as
prior
task
knowledge.
Further-
more,
a
learning system t hat
malies
use of prior knowledge
can
train more
efficiently
and
require
fewer
training
examples
[BaxtgFjb].
Currently,
there is no adequate theory
of
how
t ask knowledge
can
be retained and
then
selectively
transferred when learning a new
task
[TIiru97a,
Caru97ah
From a practical perspective. many applications of machine learning
systerns,
such
as
da t a rnining and intelligent agents, suffer from
a
deficiency of training
esampl es
and could benefit from the use of prior task knowledge- For
esample.
a
more
accurate
medical
diagnostic model could be developed frorn
a
small
sample of patients if
re-
lated diagnostic models were available and
accessibIe
t o t he learning system,
-4lternatively.
t he user profile for a new
ernaii
user could
be
learned more
rapidly
if prior knowledge of
similar user profiles
were
considered by t he learning component of t he mail
toot.
1.3
Research
Objectives
Thi s
thesis
investigates t he retention and integration of t ask knowledge
after
it
has
been
induced and i t s
selective
recall and use when learning
a
new
task- Integration of prior task
knowledge
mil1
be referred t o as
consolidation.
Selective recall and
use
of prior knowledge
will
be referred t o as
transfer.
The
thesis
focuses
on systems of artificial neural
networks
(ANXs)
t hat ar e capable of sequentially learning a
series
of t asks from the
same
protlem
domain
using knowledge consolidation and transfer.
The research reported here
ha s
two objectives.
The
first objective
is
t o
develop
a
theoretical model of knowledge transfer t hat uses previously induced task knowledge t o
rninimize t he number of training
esamples
required for learning
a
new task t o an acceptable
level
of generalization accuracy and t o
decrease
t he training time for t hat task. Th e second
objective is t o build a prototype system t ha t tests t he theoretical model.
The
prototype
system
will
be tested against specially designed synthetic
domains
of
tasks
t o
verify
t he
theory. The system
will
also be applied
to
a practical
decision
making
problem in t he
field of
medicine
where there is an important need for t he accumulation and use of dornain
knowledge.
1.3
Motivation
Thi s research proposes new met hods of sequential consolidation and
t
ransfer
of t as
k
knowl-
edge within the
cont est
of
artificial
neural networks and tests
these
rnethods against
sÿn-
thetic
and real-world
probIem
domains.
Primarily, t he perspective is t hat of comput er and
cognitive science. However, i t
is
unavoidable t o
consider
t he
resuIts
from ot her fields t hat
have contributed t o t he development of machine learning theory and
,433
learning
algo-
rithms.
Therefore. motivation
comes
frorn
several
fronts: cognitive science, psychological
and physiological evidence,
advances
in computational learning
the or^;,
and t he
desire
t o
solve
real-world
problems.
Cognitive Science. The question of
how
t he hurnan mind is able t o st ore and
later
utilize
acquired task knowledge is central t o this research. As
a
child. a
person
first
learns t o drive
a
scooter and
then
a tricycle.
Later,
he learns his balance on a bicycle and t he
rules
for driving
on
a
public highway. As
a
young
adult
t he
person
learns t o drive
various
motorized
vehicles
such
as
cars and motorcycles.
By
the time t he
person
is
30
years of age he
has
acquired
and consolidated knowledge concerning t he control of
a
number of vehicles. The sequence
of
t ask
learning, from simple t o cornplex,
provides
t he
tearner
with not
only
knowledge of
how
t o drive each machine but also knowledge of t he commonalties and differences across
a
domain
of vehicles. Thi s
domain
knowledge
can
be used t o learn more
easily
how t o drive
any new vehicle. Persons
who
have acquired
a
vast
ar r ay
of
domain
knowledge and ar e able
t o use it effectively when solving
new
probiems
are
often
referred
to
as
experts.
IF
a
machine learning system is to emulate t he ability of
human
learning t o use dornain
knowledge, it must have a method of acquiring
such
knowledge in t he first place.
It
makes
sense
t
hat this met
hoc1
be, itself.
a
learning process or better st at ed, a
meta-leamzing
process
[JL53,
Holl891.
-4s
in t he case of a child,
a
iearning
sFstem
should s t ar t
as a
pure inductive
learner
having no background knowledge of t he
problem
domain.
Thi s
makes
for
a
slow
and
unsure
beginning t hat is
typified
by
trial and error. However.
as
t asks ar e learned
successfully, knowledge of t he
domain
increases and t he learning system should no longer
pursue
naive
models t hat clearly
fly
i n
the face of previous
experience.
Therefore, motivation
for this research cornes from t he
desire
t o creat e machine learning syst ems t hat
benefit
from
learning many different t asks over a
lifetime,
Psychological and
neurophysiological
evidence.
Psychological evidence
[Harl49,
Mar?c44?
Ward371
indicates t hat the effectiveness and efficiency of the mammalian brain to learn
a
new
task is closely related to knowledge of similar tasks.
If
the task is similar to previous
tasks, a positive transfer of task
knowledge
will occur.
If
the task is dissimilar t o previous
esperience,
it is
Iikely
that a negative transfer of
knowledge
d l
OCCU~.
Furthermore. t here
is an abundance of evidence that
during
the learning process
humans
and animals
develop
not only specific discriminate
models
but
aIso
a sensitivity
t o
similar structural relations
among the input stimuli
[E;eho88].
Research led
by
James
McCIeIIand
has
inffuenced
the research direction of this
thesis.
In
[hlcC194],
McClelland
discusses the process of
memory
consolidation:
".--
we suggest that the neocortex may be optimized for the
gradua1
discovery
of
the shared structure of events and experiences, and t hat the hippocampal system
is there to
provide
a mechanism for the rapid acquisition of new information
wit
hout
interference wit
h
previously
discovered regularities. After t his initial
acquisition, the hippocampal
systern
serves
as
teacher to
t he
neocortes: That is,
it
aI1ows
for
the reinstatement in t he neocortex
of
representations of
past
events.
so that they may be
gradually
acquired by the cortical system via interleaved
learning.
We
equate this interleaved learning process with consolidation,
and
we
suggest that it
is
necessarily
sl av
so that new knowledge
can
be integrated
effectively
into the structured
knowledge
contained in the neocortical system."
In addition, there is evidence that,
by
way of bi-directional connections. the neocortes
provides
information to the hippocampus
during
short-term learning that
may
positively
influence the generation of episodic
models
[Squi92].
Advances
in
computationaI
learning
theory.
Theoretical developments point t o the need
for inductive bias
during
learning
[MitcSO].
In
fact, wi t hout
a
source of guidance
during
the learning process, there is
1ittIe
hope
of
ever
producing
a machine that
can
learn most
real-world
tasks.
Domain
knowIedge
h a s
been recognized
as
a major source of inductive
bias. The challenge
is
to
discover
a method for retaining knowledge from previous learning
esperiences
and for using that
knowledge
selectively to
benefit
future learning.
Researchers
like
Pau1
Utgoff
feel
t h a t
the
search for
t he
most appropriate inductive bias
is
a
fundamental part of machine learning.
-4s
he discusses in
[Utgo86]:
"An
inductive concept learning
program
ought to conduct its own search for
appropriate bias.
Until
programs h ave
such
capability, the search for appropriate
bias
wilI
remain a
manual
task dependent on a
human's
ability to
perform
it.
This gap in understanding constitutes the largest weakness in current methods
of machine learning of concepts from examples."
Motivation for t his research
comes
from t he
desire
t o find
ways
t o eliminate
t
his
weakness.
The
desire
t o
soive
real-world pro blems. The
complesity
of many real-world pro blems
rneans
t hat t he current theories of learning do
Iittle
to
ensure
t hat t he models
that
ar e
developed
can
be
relied
upon.
For
esample,
pure inductive learning,
as
formalized
by
t he
Probably
Approximately
Correct
(PAC)
theory,
dictates
t hat for most
real-worId
tasks
a
very
Iarge number of training
examples
must be used if
a
reliable
model
is t o be
found
by
a
learning
systern
(see
.4ppendis
B).
The reality is t hat it is often
difficult
t o
find
large numbers of
examples.
Practicalities
such
as
cost?
o r
risk
t o
human
health, or
privacy
prohibit
t he collection of dat a set s of sufficent
size.
A
comrnon
variant of this problem is an
unbalanced
set of d a t a in which t he number of
exarnples
for one
class
is s u bstantially higher
t han
al1
other
classes.
Su
bsequently,
most
statistical and machine learning studies are
ill-
posed on the
basis
t
hat the sample size is
srnaller
t han
is required t o
formulate
an
accurate
hypothesis. Nonetheless,
human
esperts
have
a
great
deal
of success
at
developing
compIes
models t hat
demonst rat e
good generalization. One reason for this is t he experts' use of
knowledge of t he problem
domain
t o constrain the search for
a
practical model. Therefore,
a
furtlier
motive for this research is t o overcome t he sampl e
complexity
requirement for
pure inductive learning by acquiring and using
domain
knowledge.
Machine learning and
medical
decision
making.
The field
of
medicine
is ripe for the ap-
plication of machine Iearning technology
[Scot93].
There have already been
a
number of
successes using inductive
decision
trees,
instance based
learning
met hods, case based reason-
ing systems, and
ANL\Ts
[DetsSl].
A
surveÿ
of
medical
and
bio-medical
journals
since
1989
produced over
700
articles on t he application of neural networks in biomedical research and
clinical
diagnosis (for
examples
see
[Baxtglb,
Baxtgla?
Asti92,
Akay93,
Boon93,
Daws9.21).
-4t
the 199.5
World
Congress on Neural
Networks
in Washington?
DC,
t here
was
a
two-
day session dedicated t o t he use and regulation of neural networks in medicine.
At
t hat
session
a
major point
was
made t hat t he
complexities
of
medical
science t oday are forcing
t he use of aut omat ed tools for
ciinical
diagnosis
[Burk93].
Furthermore, t he non-linear inter-
action of
various
diagnostic attributes requires t he use of sophisticated
modelling
systerns.
Thi s is bringing
about
a
shift from
data-based
analysis t o
model-based
analysis.
Typically,
t he
cornplesity
of
medical
decision
making
problems
means
t ha t large numbers
of
training
examples
are required t o develop
reliable
diagnostic models using pure inductive
tearning
systems. But
reatistically,
on
a
per doctor, or per hospital, o r even per
region
basis?
t her e is a n insufficient number of
examples
t o
meet
t hi s requirement. Therefore,
a
motive for t hi s research is t he
desire
t o creat e bet t er
medical
decision
making
syst ems t ha t
continually acqui re and
utilize
domain
knowledge
for t he development of more
accur at e
diagnostic models.
1.4 Research
Approach
As st at ed in Sect i on
1.2,
t he objective of t hi s
thesis
is
to
investigate t he retention and
integration o r
consolidation
of task knowledge aft er
it
has
been induced, and i t s
setective
recall
o r
transfer
to
facilitate t he learning of
a
new task. The
focus
is on syst ems of
ANXs
t hat ar e capabl e of sequentially learning
a
series
of
t asks
from t he
sarne
problem
domai n.
The research effort began with a
general
definition of t he problem of
knon-ledge
t rans-
fer along wi t h
a
br oad set of research objectives. The definition and objectives gui ded
a
thorough survey of
esi st i ng
Iiterature and current research, covering:
0
t ask and
skill
t ransfer, knowledge consolidation and analogical
reasoning
from t he
fields of artificial intelligence, psychology and neuroscience:
0
i mpor t ant
ar eas
of computational learning t heory
such
as
probably
approxi mat el y
correct
(PAC)
learning (see
Appendi s
B)
and inductive
bias:
+
mat
hematicaI,
st at i st i cal and
psychoIogica1
definitions of relatedness and
similaritu.
particularly in regard t o task knowledge;
specific feat ures of
-4NNs
such
as
cat ast rophi c
interference.
stability
and
plasticity,
network
~ ~ i g h i
iriitialization,
and multiple t ask learning: and
0
participation wi t h
fellow
researchers
at
related conferences and
workshops.
From
t he breadt h and dept h of the survey information a more focused research obj ect i ve
was
formulated.
In
addi t i on, t he survey information identified t wo fundarnentally different
approaches t o knowledge transfer:
representational
t ransfer and
functional
transfer.
-4
summar y
of t he surveyed
material
and t he formulation of t he specific research obj ect i ve is
presented in Chapt er
2.
Pursuing,
at
first,
representational transfer: t he research t ook on an
esperi ment al i st
approach made
up
of t he following sequence of steps:
t he generat i on of knowledge transfer t heory;
t he devel opment of a prototype sequential learning syst em;
0
esperimentation
using the prototype on synthetic and real-world
domains
of tasks
This
same
approach
was
subsequently used to investigate functional task knowledge trans-
fer. In total: over
23
task
knowledge
consolidation and t ransfer experiments
were
cond
ucted
using two sequential learning systems on either synthetic tasks
domains
using computer
generated
examples
or real-world task
domains
using
coilected
esamples,
The
difficuIties
encountered wit h the representational t ransfer
t
heory and experimental software are briefly
discussed in Section
2 - 3 5
The more successful functional transfer theory. prototype sys-
tem, and experiments are presented in Chapters
3
through
6.
The importance of both
representational and functional
t
ransfer and a st
rategy
for integrating the two are discussed
in
Chapter
7.
1
Overview
of
the Dissertation
The remainder of the
thesis
is
organized into eight chapters. Chapter
2
begins with a
surnmary
of the relevant background
material
on inductive learning, inductive
bias
and
ANI%.
Then
a survey of recent
work
on task knowledge transfer and sequential learning in
the
contest
of
ANNs
is presented. The background and survey
material
is
consolidated
into
a set of major open research questions.
Finally,
based on a subset of the research questions,
the objectives and scope of the research are specified.
Chapter
3
develops
a theory of
functional
knowledge transfer that uses a modified version
of the multiple task learning
(MTL)
ANN
method called
7MTL
and the concept of task
rehearsal
based on the
generation
of virtual examples. The theory
requires
that three
requirernents be
satisçied:
(1)
that there
exists
a method of sequentiallÿ retaining and
transferring functional task knowledge
in
the form of virtual esamples.
(2)
that
rlMTL
provides
a
framework for employing a
measure
of
ta&
relatedness for the
selective
transfer
of
knowledgeo
and
(3)
that
suitable
measures of relatedness
can
be
found
for a
domain
such
that appropriate transfer
occurs.
The chapter
esplores
various
solutions to each of
these
requirements and proposes a
suitable
test
domain.
The
cliapter
finishes by describing a
prototype software system that
has
been developed.
In
Chapter
4,
solutions to the second and third requirements of the theory of selec-
tive functional transfer are tested on two synthetic
domains
of tasks of
varying
degrees of
relatedness. Using the prototype system,
7MTL
is
tested
as
a framework for
selectively
transferring knowledge from secondary source tasks t o a
primary
task
by
manually
manipu-
lating the
mesure
of
re!atedness
for each secondary
tasic.
Enperiments
are
then
conducted
on
both
domains
using as many
as
eight different automated measures of relatedness. The
results
from the
ex~erimental
runs are compared to one another and to results by single task
learning and standard
muItiple
t ask learning results.
The
important findings a r e discussed
in t he
last
portion of the chapter.
In
Chapt er
5,
the
Task
Rehearsal
Method
(TRM)
of sequentiai Iearning is tested
as
a
solution t o t he first subproblem of t he theory of
functionaI
knowledge transfer-
Using
t he
prototype systern, experiments using
T2.k;
are conducted on the two synt het i c
domains
of
tasks
introduced
i n
Chapter
5.
T h e prototype system's ability t o sequentially retain.
consolidate and transfer knowledge is tested using
MT L
and
7-MTL
and
several
rneasures
of
relatedness. The results are compared
to
single
t ask
learning for each task of t h e sequence.
Chapt er
6
reports on t he application of functional transfer to
a
real-worId
medical
dornain. Th e subject is the diagnosis
of
coronary art ery disease. Sequential learning of
seven tasks tests the ability of the functional transfer prototype to consolidate and transfer
knowledge from previously Iearned
t asks.
Chapt er
7
discusses the outcornes of t he experiments reported in t he previous
chapters?
presents
a
number of advanced issues and open questions t hat arise from t he research
and
considers selective functional transfer using ot her machine learning systems.
Chapt er
8
concludes with
a
surnmary of t he objectives and approach of t he research,
a
list
of
the important
findings
and contributions made, and suggestions for f ut ur e
work.
Following
Chapter
8,
a
series
of appendices
provide
a
glossary of acronyms and
terrni-
noIogy
used throughout the
documeni;
and detailed information on i mport ant
t heory
and
mathematics discussed in
various
chapters.
Chapter
2
BACKGROUND
AND
PROBLEM
FORWfULATIObÏ
Thi s
chapt er presents
a
surnmary of relevant background
material
followed
by
formulation
of t he research
probtem
and t he objective and scope of t he research effort- The
first
section
presents
foundationai
material
on inductive learning, inductive
bias
and artificial neural
networks. The second section
sumrnarizes
a
survey of Iiterature on
ta&
knowledge transfer
and sequential learning
within
t he
context
of
ANNs.
The third section
consoIidates
t he
background and survey
material
and identifies
several
key research issues. The final section
defines
t he research
problem
a nd t he objective and scope of t he dissertation.
2.1
Background
on
Inductive Learning and
ANNs
-4
research
program
concerned
with
knowledge transfer and sequential
iearning
in t he con-
t e s t of neural networks requires foundationai background in inductive and analogical rea-
soning, cornputational learning theory, and
artificial
neural networks. Thi s section presents
a
surnmary of relevant
material
in each of
these
areas.
2.1.1
The
Framework
of
Inductive Learning
Many
phenornena in t he
world
can
be
espressed
as
functions which map from
a
set of input
variables, o r attributes, t o a n out put variable or
a
set of out put variables. For
esample,
t he
amount
of electrical current
flowing
through
a
wire is
a
function of t he impedance of t he
component s
in
the circuit and t he voltage
applied.
One t ype of function is
a
classification
task
o r classifier,
j,
tvhich
maps
a
set of input values,
x:
t o one of
a
discrete
set
of target
output. values,
j ( x )
normally
referred
to
as
classes
o r
categories.
For
example,
t he presence
of coronary art ery disease
can
be classified
based
on clinical at t r i but es
such
as
age, gender.
and
blood
pressure,
as
well
as
t he findings of diagnostic t est s
t ha t
check for
abnorrnal
fieart
r yt hmn
during
States
of stress
and
rest. When there ar e
only
t wo classification values.
such
as
"has
disease"
or
"does
not have disease"
,
a
classification t ask is sometimes referred t o
as
a
concept.
Thi s research
will
concern
itself
primarily wit h
concept learning.
Output
Classification
Figure
2-1:
The
basic
framework for inductive
learniiig.
Inductive learning is an inference process which constructs
a
mode1
or hypothesis,
h.
of
a
classification task,
f,
from
a
set of training examples. Inductive learning is
said
t o
be
superniseci
when each
eaample
includes its correct target class
as
a
training signal. This
research
will
deal
strictly
with supervised inductive
learning;
henceforth. t he use of the
word
-'learning3
is meant to imply this supervised format.
Figure
2.1
presents t he basic framework in which
a
supervised inductive learning
system
may
be positioned. There is a source of
error
lree
classification
examples
referred t o
as
t he
en,cironment.
Let
,Y
be an instance space of possible input values
xi
with some
fised
probability distribution
of
occurrence,
19.
To
learn
a
concept task
f,
t
hat
is
a
function of
t he inputs,
a
random sample according t o
D
is
taken
From
to creat e
a
set of training
exam
ples:
s
=
((21.
f
( XI ) ),
(227
f
(x2)L
- - - 7
(2,:
f
(xm))).
Each training
example,
<
must
consist of
a
set of input
attribute
values,
xi,
as
well
as a
t arget
class
value,
t i,
such
t hat
ti
=
f
( x i )
The objective of an inductive
learaing
system.
L,
is t o select or develop t he hypothesis,
h
(induced
mode1
of classifier), using t he training
esamples,
S,
such
t hat
h
approsimates
t he actual t ask
below
a desired
level
of
error.
E.
Let
t he
t me
error
of the hypothesis be defined
as
T
hen
formally,
where
a
+
b indicates that b
is
inductively inferred from
a.
T h e
expression
States
that
the Iearning spstem should use the available training
examples
to
induce
a
hypothesis.
h.
such
that the portion of
examples
in
,.Y
misclassified
by
h
is Iess
than
c.
Preferably.
as
the number of
examples
m(referred
to
as
the
sample
size)
increases,
L
should develop a
hypothesis
h
of
increasing
accuracy.
Clearly,
if
al1
exarnples
in
X-
are
seen
by
the learning
systern, given sufficient representation, the system should be able t o construct a perfect
h,
such
that
h(xi)
=
_f ( xi )
for
a11
xi
E
.Y.
The degree of
approsimation
For a hypothesis
can
be estimated
by
its classification
performance against a set of
previously
unseen
test
examples
drawn
rom
environment
-Y
according to distribution
O-
This estimation known
as
the
empirical
or
yenernkation
error
is
some
measure
er( h( xi ),
f ( x i ) )
for
a11
(xi,
f(xi))
in the test set. Preferably. the error.
e'lh,
f)
,
across
a11
test
exam
ples
is Iess
t
han or equal to some previously agreed upon value.
-4
hypothesis with good classification performance
on
test
exarnples
is
said
to
eshibit
good
generalization
or high
generalization accuracy.
Consider
a simple learning system which
h a.
available
a hypot hesis space
H
containing
a
finite
number of distinct hypotheses
Assume a training sample
S
containing
rn
examples
of some
task
f.
For each
h
and for
each
esample
of
S
compare the output of the hypothesis to the supervised target
class.
If
the hypothesis disagrees with any target class,
then
reject
it.
A
good
choice of hypot hesis
is one that agrees. or is
consistent
with the
entire
sample
S,
if
such
a hypothesis
esists.
Const ructing a hypot hesis wit h good generalization is not
sirn
ply
a mat ter of
memoriz-
ing the training
esamples.
This
would
produce
an
efFect
known
as
ocer-training
which is
analogous to
over-fitting
a
complex
curve to a set of data points. The
mode1
becomes too
specific
to the training
examples
and
does
poorly on a set of test esamples. Instead. the
inductive learning system must discern from the training
esamples
t
hose
global
regularities
which
properly
discriminate between the classes.
This
raises
an important question:
How
many
examples
are required by
a
Iearning
sÿstem before it
can
ever hope to
produce
a hypothesis with good generalization?
-4n
answer to this
was
provided by
Leslie
G.
Valiant
in
1984
when he proposed the
probably
opproximately
correct,
or
P.4C,
theory of learning
[Vali84IL.
The
P.4C
mode1
of Iearning
characterizes training
esamples
by their statistical properties, and
measures
the error
i n
the
hypothesis produced
by
a
learning system
in
light
of the
sarne
statistical properties.
'This
can
be
viewed
as
a
probabilistic
extension
of
E M.
Gold's
identification
in the
lirnit
paradigm
[GoldGC]
.
Valiant
defines
al gori t hm
L
t o be
a
probably
approximately correct
l earni ng al gori t hm2
for a classification t ask
f
i n t he hypot hesis space
H
if for
0
a
confidence level
6,
such
t hat
O
<
6
<
1,
and
0
a n error
threshold
E,
such
t hat
O
<
E
<
1,
t her e
esi st s
a positive i nt eger
mo
a function of
(6,
E )
such
t hat
for any t arget concept
h
E
H,
and
a
for any probability di st ri but i on
over
t he
exampl e
space
D( X )?
whenever
m
2
no,
t hen
D( S
E
fl e(h,
f)
<
E )
>
1
-
6.
Thus,
t he
mi ni mum
nurnber of
exarnpIes
mo
depends
upon t he values
b
a n d
E?
but
never
on t he t arget classification t ask
f
o r t he unknown di st ri but i on
O.
Thi s
means
t ha t t he desired level of confidence and
er r or
can
be set even t hough t he t arget
t as k
and t he distribution of
exampl es
i s unknown.
Furt hermore,
Valiant
showed
t ha t
any
fi ni t e
hypothesis space
H
is
potentially
learnable
and
t ha t t he number of
esampl es
required t o l earn a consistent hypothesis h is
gi ven
by:
where
lHI
is
t he nurnber of hypotheses i n t he hypothesis space. Thi s is a n i mpor t ant
t heorem
since
i t tells us how many
exampl es
ar e required t o have a consi st ent learning
algorithrn
achieve
leveIs
of confidence
6
wi t h a n error rat e less
t han
E.
In
fact, t hi s t heorem
covers
al1
Boolean hypothesis spaces
{O,
1)"
for a fixed
n.
A11
such
spaces a r e potentially
learnable,
and
any al gori t hm which
consistently
learns
a
hypothesis
can
be
considered
P-4C
corn
pliant.
2- 12
Inductive
Bi as
and
Prior
Knowledge
In practise t he
PAC
t heor y
has
i t s limitations. As with
al1
comput i ng
engi nes,
machi ne
learning systems must
perform
using
finite
t i me and space resources.
.4lthough
a
PAC
learning systern is abl e t o
theoretically
l earn a hypothesis t o
a
desired level of accuracy. i t
may require an
unrealistic
number of
esampl es,
amount
of memory
and
t i me
to
cornpute.
Therefore, in practice, a
P-4C
learning syst ern
must
avoid using an
unreasonabl y
large
hypothesis space
(such
as
t he space of
al1
Boolean equations of
n
variables)
si nce
t he search
of t hi s space for a n
accur at e
hypothesis may
Le
intractable3. Thi s
is
i n
conflict wi t h
t h e
fact t ha t most
cornplex
real-world
problems have large hypothesis spaces.
*For
greater
detail
the reader
is
referred
to
Appendix
B.
3For
this
reason,
PAC
research
has
concentrated on the
discovery
of
domains
(eg.
boolean
conjunc-
tive
normal
form
equations of
k
terms and n
variables
=
CNF( n,ki ) )
which
can
be
learned
to a
leve1
of
generalization
error in
polynomial
timc.
Such
domains
are
said
to
be
efficiently
PAC,
or
EP:\C.
leamable.
For
rea1-world
problems, some
strategy,
or
heurîstic,
must be
employed
t o
"inteIligently-
restrict t he hypothesis space,
H,
thereby
making
the search process
computationaIIy
efficient
[Hertgl].
Any constraint of
a
learning system's hypothesis space, beyond t he criterion of
consistency with t he training
examples,
is
called
inductive
bias.
AI1
learning systems have
a
bias t hat
favours
certain types of hypotheses over others.
ANN
learning
algorit
hms naturally
favour hypotheses with weight values t hat ar e
small
in magnitude, Inductive
decision
tree
learning algorithms tend t o favour hypotheses t hat are represented by
srnaIl
trees instead
of
those
t hat are represented by larger
bushier
trees. Practitioners have been forced t o t ake
a n
ad
hoc
approach t o t he application of machine learning systems:
first
one
algorithm
is
used
(e-g.
a
neural network) and
t hen
anot her
(e-g.
an inductive
decision
tree) in t he
hopes
of finding t he appropriate inductive
bias
for t he
problem.
It
has
been shown on
several
occasions t hat there cannot
exist
a
universal inductive learning algorithm, be i t biological
or machine, which
can
perform
equally
well
on
al1
classification
tasks
for some
esample
space
;Y
[Roma94].
Instead, every
learner
utilizes
an inductive
bias
rvhich
favours
it
in
some
domains
while
handicapping i t i n
others-
Inductive
bias
is
essential for t he development of a hypothesis with good
generalization
in
a
tractabIe
amount
of time and from
a
practical number of
esamples
[-Ilitc80,
Vlitc97].
Inductive bias characterizes t he met hods
a
learning system uses t o
develop
a
generalized
mode1
of
a
task,
beyond the information found in the available training esamples. The
inductive bias of
a
learning system
can
be
considered
a Iearning system's preference for one
hypothesis over another. Thi s preference
produces
a
partial
ordering
over t he hypothesis
space of t he
learr'ing
system t hat
can
be
used t o reduce t he average t i me t o find
a
sufficiently
accu rat e hypot hesis.
Definition:
Formally,
as
per
['ilitc9/1,
we
define
the inductive
bias
of
a
learning system
L
t o be t he set of assumptions
B
such
t ha t for
a
set
of training esamples,
S:
from a n instance
space
X
for
a
concept
f
( x i )
:
B A
SA
xi
F
h(xi)
where
a
l-
b
indicates t hat
b
can
be
logically
deduced from
a.
Thi s espression
defines
inductive bias
as
the set of additional assumptions
B
t hat justifies t he inductive inference
of
a
learning system
as a
provable deductive inference.
The
inductive bias
can
be
consider
correct if
(Vxi
E
,XV)e(h,
f)
=
0.
Current h,
in most learning syst ems t he inductive bias remains
fised.
For
esample,
an
inductive
decision
tree system with
a
preference for
smaller
trees orders t he
List
of possible
hypotheses starting
with
t he smallest possible tree of only one
node.
Ideal l s
a
learning
system is able t o change i t s inductive bias t o
tailor
its preference for hypotheses according
to
t he
ta&
being learned. The
ability
to
change inductive bias requires t he Iearning system
t o have prior knowledge about some aspect or aspects of t he
t a s k
Furthermore,
it suggests
t hat t he accumulation of prior knowledge
at
t he inductive bias
level
is
a
useful
characteristic
for any learning
system.
Acquiring
and using prior knowledge
as
a
source of inductive bias is one of t he unsolved
problerns in Iearning theory.
However, some
advances
have been made over t he
last
20
years-
Five maj or classes of inductive bias used
by
intelligent
learners
a r e cited in
[bfitc80];
universal heuristics, knowledge of intended use, knowledge of t he source,
analogy
wit
h
previously learned
taskst
and
knowIedge
of t he task
domain.
AH
ar e forms of prior knowledge
used t o
facilitate
t he search of hypothesis space.
Universal
Heuristics.
Universal heuristics for learning ar e met hods of bias which force
a
priori
conditions on t he
i ~ r l
iict.ici?i
process independent of t he t ask o r pro blem
domain.
The
most
accepted and
widelv
applied universal heuristic is t he
Law
of
Economtj,
often referred t o in
machiiie
Iearning
as
Occam's
Razor.
Th e
name
derives
itself
from
William of Ockham
(128.5-1349)
tvho
first
st at ed "non
sunt
mvltiplicanda
entia
prater
necessitatem"
which
can
be translated
as
"entities
ar e not t o
be
multiplied
beyond necessity" or
"plurality
should not be assumed without necessity". Thi s
is
congruent with t he common belief t hat t he simplest explanation of observed facts
is
often t he best explanation. The
work
of
Koimogorov.
Solomonoff. and
Levin
[LiS'L.
Anth921,
Blumer
et al.
[Blurn87],
and
Rissanen
[RissiS.
Riss591
have given
forma1
support t o this intuition by showing t hat the optimal method of representing a
series
of
esamples
is
based
on
a
minimization of the description length of t he hypothesis.
For machine learning, Occam's