INTRODUCTION TO
MACHINE LEARNING
Ivan Bratko
Univ. of Ljubljana, Slovenia
SOME KINDS OF MACHINE LEARNING
•
Learning distributed between “teacher” and learner
•
Type of learning: How much work is done by each
partner
“Teacher”
Learner
Examples,
other info.
SOME KINDS OF MACHINE LEARNING
•
By being told 
“teacher” tells all, problem to understand teacher
•
From examples  inductive learning
“teacher” provides (possibly good) examples,
“learner” generalises
•
By discovery 
learner plans experiments, possibly formulates
description language, and generalises from
observations
•
Learning from examples:
most practically important
SUPERVISED, UNSUPERVISED
•
Supervised learning
: examples are labelled (given
class)
•
Unsupervised learning
: examples not labelled (no
class information given); clustering
LEARNING ABOUT MUSHROOMS
0123456
0
1
2
3
4
H
W
+




+
+
+
+
+
+
+
+
+









•
Attributes
: W (width) and H (height) of mushroom
•
Classes
: poisonous, edible
H
W
IF 2 < W and W < 4 and H < 2 THEN “edible”
ELSE “poisonous”
HYPOTHESES ABOUT MUSHROOMS
0123456
0
1
2
3
4
H
W

+
+
+
+
+
+
+
+
+
+












HYPOTHESES ABOUT MUSHROOMS
0123456
0
1
2
3
4
H
W

+
+
+
+
+
+
+
+
+
+












IF H > W THEN “poisonous”
ELSE IF H > 6 – W THEN “poisonous”
ELSE “edible”
HYPOTHESES ABOUT MUSHROOMS
0123456
0
1
2
3
4
H
W

+
+
+
+
+
+
+
+
+
+












IF H < 3 – (W3)
2 THEN “edible”
ELSE “poisonous”
HYPOTHESES ABOUT MUSHROOMS
•
All three hypotheses are consistent
with the
examples
•
Some are more general
than others
HYPOTHESES ABOUT MUSHROOMS
0123456
0
1
2
3
4
H
W

+
+
+
+
+
+
+
+
+
+












•All three hypotheses are consistent with data
•There are differences when classifying new data
H1
H2
H3
REPRESENTING HYPOTHESES
ABOUT MUSHROOMS
•
Hypothesis 1:
If 2 < W and W < 4 and H < 2
then "poisonous" else "edible"
•
Hypothesis 2:
If H > W then "edible"
else if H > 6 – W then "edible"
else "poisonous"
•
Hypothesis 3:
If H < 3 – (W3)2 then "poisonous"
else "edible"
DECISION TREES
•
Hypothesis 1 can be
represented as decision
tree
Hypo 1 as ifthen rule:
IF 2 < W and W < 4 and H < 2
THEN “edible”
ELSE “poisonous”
W < 2
W > 4
H > 2



+
True
True
TrueFalse
False
False
HYPOTHESIS LANGUAGES
•
Attributevalue descriptions
:
refer to single attributes
If W < 4 and H > 2 then ...
•
Relational descriptions
:
refer to relations among components
If H > W then ...
LEARNING RELATIONAL DESCRIPTIONS
•
Learning the concept of an arch
Induced hypothesis:
•An arch consists of three
rectangles A, B and C;
•A and B support C;
•A and B must not touch
LEARNING CONCEPTS FROM EXAMPLES
Concepts as sets
U, set of all objects (instance space)
A concept C: C U
To learn concept C means: learn to recognise,
for all X in U, whether:
X is in C, or X is not in C
U
C
•X
EXAMPLES OF CONCEPTS
concept of an arch
concept of poisonous mushrooms
concept of a certain disease
concept of multiplication:
U = set of tuples of numbers
Mult = { (a,b,c)  a*b = c }
concept of movable
LEARNING FROM EXAMPLES
C target concept
+ positive examples
 negative examples
C ’ induced concept (learned), hypothesis
ACCURACY OF INDUCED
HYPOTHESIS
•
( C  C ’ ) ( C ’ C ) errors, incorrect classifications
•
Accuracy of induced concept C’ =
Proportion of correct classifications =
 U  (C  C ’ )  (C ’ C )  / U 
•
Concepts are stated in a concept description
language (e.g. decision tree)
TYPES OF ERRORS
C
C’
False positive
False negative
TERMINOLOGY
Instance space, space of all possible objects
Target concept
Hypothesis
Examples: positive, negative
Attributes, classes
Hypothesis languages
Hypothesis consistent with examples
Hypothesis H1 more specific than H2
Hypothesis H2 more general than H1
BIAS IN MACHINE LEARNING
What is bias ?
Restriction bias
limits the set of possible hypotheses;
also called language bias
Preference bias
orders available hypotheses;
also called search bias
EXAMPLES OF PREFERENCE BIAS
maximally general hypothesis
maximally specific hypothesis
minimal description length MDL
Occam's razor
maximally compressive hypothesis:
encoding_length(Hypothesis) <<
encoding_length(Examples)
OCCAM’S RAZOR
•
William of Occam (Ockham), 1320
•
Applied in experimental science as Occam’s razor
“Entities should not be multiplied unnecessarily”
Given two explanations of the data, all other
things being equal, the simpler explanation
is preferable.
CRITERIA OF SUCCESS IN MACHINE
LEARNING
Accuracy of induced hypotheses
Comprehensibility, "understandability“,
interpretability of induced hypotheses
Often in practice comprehensibility more
important than accuracy
RELEVANT OTHER DISCIPLINES
•
Artificial Intelligence
•
Statistics
•
Bayesian methods
•
Computational complexity theory
•
Control theory
•
Information theory
RELEVANT DISCIPLINES, CTD.
•
Cognitive science, psychology and neurobiology:
How humans learn?
•
Philosophy
D. Gillies, Implications of AI to philosophy of science
SOME BOOKS ON ML
•
The following are two excellent standard text books on ML:
Mitchell, T.M. (1997) Machine Learning. Mc Graw–Hill.
Witten, I.H., Frank, E. (2005) Data Mining: Practical Machine Learning Tools and
Techniques, 2nd edition. Elsevier.
Mitchell gives a broader coverage of learning paradigms and goes to greater
depth theoretically. Witten & Frank is more practical and goes into more detail
regarding attributevalue learning
•
The following book covers in a more compact manner many additional topics in
ML:
Kononenko, I., Kukar, M. (2007) Machine Learning and Data Mining:
Introduction to Principles and Algorithms. Chichester, UK: Ellis Horwood.
Comments 0
Log in to post a comment