1
Lerende Machinekes
Machine learning: Introduction and
Classical ML algorithms (1)
26
april
2006
Antal van den Bosch
Machine Learning
The field of machine learning is concerned
with the question of how to construct
computer programs that automatically
learn with experience.
(Mitchell, 1997)
•
Dynamic process: learner L shows
improvement on task T
after
learning.
•
Getting rid of programming.
•
Handcrafting versus learning.
•
Machine Learning is
taskindependent
.
Machine Learning: Roots
•
Information theory
•
Artificial intelligence
•
Pattern recognition
•
Took off during 70s
•
Major algorithmic improvements during
80s
•
Forking: neural networks, data mining
Machine Learning: 2 strands
•
Theoretical ML
(what can be proven to be
learnable by what?)
–
Gold,
identification in the limit
–
Valiant,
probably approximately correct learning
•
Empirical ML
(on real or artificial data)
–
Evaluation Criteria:
•
Accuracy
•
Quality of solutions
•
Time complexity
•
Space complexity
•
Noise resistance
Empirical ML: Key Terms 1
•
Instances
: individual examples of inputoutput
mappings of a particular type
•
Input
consists of
features
•
Features
have
values
•
Values
can be
–
Symbolic
(e.g. letters, words,
…
)
–
Binary
(e.g. indicators)
–
Numeric
(e.g. counts, signal measurements)
•
Output
can be
–
Symbolic
(classification: linguistic symbols,
…
)
–
Binary
(discrimination, detection,
…
)
–
Numeric
(regression)
Empirical ML: Key Terms 2
•
A set of
instances
is an
instance base
•
Instance bases
come as labeled
training sets
or
unlabeled
test sets
(you know the labeling, not the learner)
•
A ML
experiment
consists of
training
on the training set,
followed by
testing
on the disjoint test set
•
Generalisation
performance
(
accuracy, precision, recall,
Fscore
) is measured on the output predicted on the
test set
•
Splits in train and test sets should be systematic:
nfold
crossvalidation
–
10fold CV
–
Leaveoneout testing
•
Significance tests on pairs or sets of (average) CV
outcomes
2
Empirical ML: 2
Flavours
•
Greedy
–
Learning
•
abstract model from data
–
Classification
•
apply abstracted model to new data
•
Lazy
–
Learning
•
store data in memory
–
Classification
•
compare new data to data in memory
Greedy learning
Greedy learning
Lazy Learning
Lazy Learning
Greedy
vs
Lazy Learning
Greedy:
–
Decision tree induction
•
CART, C4.5
–
Rule induction
•
CN2, Ripper
–
Hyperplane
discriminators
•
Winnow, perceptron,
backprop, SVM
–
Probabilistic
•
Naïve Bayes, maximum
entropy, HMM
–
(Handmade rulesets)
Lazy:
–
k
Nearest Neighbour
•
MBL, AM
•
Local regression
3
Greedy
vs
Lazy Learning
•
Decision trees keep the
smallest amount of
informative
decision boundaries
(in the spirit of MDL,
Rissanen
,
1983)
•
Rule induction keeps
smallest number of rules with
highest coverage and accuracy
(MDL)
•
Hyperplane
discriminators keep
just one
hyperplane
(or
vectors that support it)
•
Probabilistic classifiers convert data to probability
matrices
•
kNN
retains
every piece of information available at
training time
Greedy
vs
Lazy Learning
•
Minimal Description Length principle:
–
Ockham
’
s razor
–
Length of abstracted model (covering
core
)
–
Length of productive exceptions not covered by core
(
periphery
)
–
Sum of sizes of both should be
minimal
–
More minimal models are
better
•
“
Learning = compression
”
dogma
•
In ML, length of abstracted model has been
focus; not storing periphery
Greedy
vs
Lazy Learning
+ abstraction
 abstraction
+
generalization

generalization
Decision Tree Induction
Hyperplane discriminators
Regression
Handcrafting
Table Lookup
MemoryBased Learning
Feature weighting: IG
Y
C
A
X
B
A
Class
Feature 2
Feature 1
Feature weighting: IG
•
Extreme examples of IG
•
Suppose data base entropy of 1.0
•
Uninformative feature will have
partitioned entropy of 1.0 (nothing
happens), so a gain of 0.0
•
Informative feature will have 0.0, so a
gain of 1.0
Entropy & IG: Formulas
Comments 0
Log in to post a comment