pdf handouts - ILK

yalechurlishAI and Robotics

Nov 7, 2013 (4 years and 1 day ago)

450 views

1
Lerende Machinekes
Machine learning: Introduction and
Classical ML algorithms (1)
26
april
2006
Antal van den Bosch
Machine Learning
The field of machine learning is concerned
with the question of how to construct
computer programs that automatically
learn with experience.
(Mitchell, 1997)

Dynamic process: learner L shows
improvement on task T
after
learning.

Getting rid of programming.

Handcrafting versus learning.

Machine Learning is
task-independent
.
Machine Learning: Roots

Information theory

Artificial intelligence

Pattern recognition

Took off during 70s

Major algorithmic improvements during
80s

Forking: neural networks, data mining
Machine Learning: 2 strands

Theoretical ML
(what can be proven to be
learnable by what?)

Gold,
identification in the limit

Valiant,
probably approximately correct learning

Empirical ML
(on real or artificial data)

Evaluation Criteria:

Accuracy

Quality of solutions

Time complexity

Space complexity

Noise resistance
Empirical ML: Key Terms 1

Instances
: individual examples of input-output
mappings of a particular type

Input
consists of
features

Features
have
values

Values
can be

Symbolic

(e.g. letters, words,

)

Binary

(e.g. indicators)

Numeric

(e.g. counts, signal measurements)

Output
can be

Symbolic

(classification: linguistic symbols,

)

Binary
(discrimination, detection,

)

Numeric

(regression)
Empirical ML: Key Terms 2

A set of
instances
is an
instance base

Instance bases
come as labeled
training sets
or
unlabeled
test sets
(you know the labeling, not the learner)

A ML
experiment
consists of
training
on the training set,
followed by
testing
on the disjoint test set

Generalisation
performance
(
accuracy, precision, recall,
F-score
) is measured on the output predicted on the
test set

Splits in train and test sets should be systematic:
n-fold
cross-validation

10-fold CV

Leave-one-out testing

Significance tests on pairs or sets of (average) CV
outcomes
2
Empirical ML: 2
Flavours

Greedy

Learning

abstract model from data

Classification

apply abstracted model to new data

Lazy

Learning

store data in memory

Classification

compare new data to data in memory
Greedy learning
Greedy learning
Lazy Learning
Lazy Learning
Greedy
vs
Lazy Learning
Greedy:

Decision tree induction

CART, C4.5

Rule induction

CN2, Ripper

Hyperplane
discriminators

Winnow, perceptron,
backprop, SVM

Probabilistic

Naïve Bayes, maximum
entropy, HMM

(Hand-made rulesets)
Lazy:

k
-Nearest Neighbour

MBL, AM

Local regression
3
Greedy
vs
Lazy Learning

Decision trees keep the
smallest amount of

informative
decision boundaries
(in the spirit of MDL,
Rissanen
,
1983)

Rule induction keeps
smallest number of rules with
highest coverage and accuracy
(MDL)

Hyperplane
discriminators keep
just one
hyperplane

(or
vectors that support it)

Probabilistic classifiers convert data to probability
matrices

k-NN
retains
every piece of information available at
training time
Greedy
vs
Lazy Learning

Minimal Description Length principle:

Ockham

s razor

Length of abstracted model (covering
core
)

Length of productive exceptions not covered by core
(
periphery
)

Sum of sizes of both should be
minimal

More minimal models are
better


Learning = compression

dogma

In ML, length of abstracted model has been
focus; not storing periphery
Greedy
vs
Lazy Learning
+ abstraction
- abstraction
+
generalization
-
generalization
Decision Tree Induction
Hyperplane discriminators
Regression
Handcrafting
Table Lookup
Memory-Based Learning
Feature weighting: IG
Y
C
A
X
B
A
Class
Feature 2
Feature 1
Feature weighting: IG

Extreme examples of IG

Suppose data base entropy of 1.0

Uninformative feature will have
partitioned entropy of 1.0 (nothing
happens), so a gain of 0.0

Informative feature will have 0.0, so a
gain of 1.0
Entropy & IG: Formulas