INTRODUCTION TO MACHINE LEARNING

bindsodavilleΤεχνίτη Νοημοσύνη και Ρομποτική

14 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

148 εμφανίσεις

INTRODUCTION TO
MACHINE LEARNING
Ivan Bratko
Univ. of Ljubljana, Slovenia
SOME KINDS OF MACHINE LEARNING

Learning distributed between “teacher” and learner

Type of learning: How much work is done by each
partner
“Teacher”
Learner
Examples,
other info.
SOME KINDS OF MACHINE LEARNING

By being told -
“teacher” tells all, problem to understand teacher

From examples - inductive learning
“teacher” provides (possibly good) examples,
“learner” generalises

By discovery -
learner plans experiments, possibly formulates
description language, and generalises from
observations

Learning from examples:
most practically important
SUPERVISED, UNSUPERVISED

Supervised learning
: examples are labelled (given
class)

Unsupervised learning
: examples not labelled (no
class information given); clustering
LEARNING ABOUT MUSHROOMS
0123456
0
1
2
3
4
H
W
+
-
-
-
-
+
+
+
+
+
+
+
+
+
-
-
-
-
-
-
-
-
-

Attributes
: W (width) and H (height) of mushroom

Classes
: poisonous, edible
H
W
IF 2 < W and W < 4 and H < 2 THEN “edible”
ELSE “poisonous”
HYPOTHESES ABOUT MUSHROOMS
0123456
0
1
2
3
4
H
W
-
+
+
+
+
+
+
+
+
+
+
-
-
-
-
-
-
-
-
-
-
-
-
HYPOTHESES ABOUT MUSHROOMS
0123456
0
1
2
3
4
H
W
-
+
+
+
+
+
+
+
+
+
+
-
-
-
-
-
-
-
-
-
-
-
-
IF H > W THEN “poisonous”
ELSE IF H > 6 – W THEN “poisonous”
ELSE “edible”
HYPOTHESES ABOUT MUSHROOMS
0123456
0
1
2
3
4
H
W
-
+
+
+
+
+
+
+
+
+
+
-
-
-
-
-
-
-
-
-
-
-
-
IF H < 3 – (W-3)
2 THEN “edible”
ELSE “poisonous”
HYPOTHESES ABOUT MUSHROOMS

All three hypotheses are consistent

with the
examples

Some are more general
than others
HYPOTHESES ABOUT MUSHROOMS
0123456
0
1
2
3
4
H
W
-
+
+
+
+
+
+
+
+
+
+
-
-
-
-
-
-
-
-
-
-
-
-
•All three hypotheses are consistent with data
•There are differences when classifying new data
H1
H2
H3
REPRESENTING HYPOTHESES
ABOUT MUSHROOMS

Hypothesis 1:
If 2 < W and W < 4 and H < 2
then "poisonous" else "edible"

Hypothesis 2:
If H > W then "edible"
else if H > 6 – W then "edible"
else "poisonous"

Hypothesis 3:
If H < 3 – (W-3)2 then "poisonous"
else "edible"
DECISION TREES

Hypothesis 1 can be
represented as decision
tree
Hypo 1 as if-then rule:
IF 2 < W and W < 4 and H < 2
THEN “edible”
ELSE “poisonous”
W < 2
W > 4
H > 2
-
-
-
+
True
True
TrueFalse
False
False
HYPOTHESIS LANGUAGES

Attribute-value descriptions
:
refer to single attributes
If W < 4 and H > 2 then ...

Relational descriptions
:
refer to relations among components
If H > W then ...
LEARNING RELATIONAL DESCRIPTIONS

Learning the concept of an arch
Induced hypothesis:
•An arch consists of three
rectangles A, B and C;
•A and B support C;
•A and B must not touch
LEARNING CONCEPTS FROM EXAMPLES

Concepts as sets
U, set of all objects (instance space)
A concept C: C  U

To learn concept C means: learn to recognise,
for all X in U, whether:
X is in C, or X is not in C
U
C
•X
EXAMPLES OF CONCEPTS

concept of an arch

concept of poisonous mushrooms

concept of a certain disease

concept of multiplication:
U = set of tuples of numbers
Mult = { (a,b,c) | a*b = c }
concept of movable
LEARNING FROM EXAMPLES
C target concept
+ positive examples
- negative examples
C ’ induced concept (learned), hypothesis
ACCURACY OF INDUCED
HYPOTHESIS

( C - C ’ )  ( C ’- C ) errors, incorrect classifications

Accuracy of induced concept C’ =
Proportion of correct classifications =
| U - (C - C ’ ) - (C ’- C ) | / |U |

Concepts are stated in a concept description
language (e.g. decision tree)
TYPES OF ERRORS
C
C’
False positive
False negative
TERMINOLOGY

Instance space, space of all possible objects

Target concept

Hypothesis

Examples: positive, negative

Attributes, classes

Hypothesis languages

Hypothesis consistent with examples

Hypothesis H1 more specific than H2

Hypothesis H2 more general than H1
BIAS IN MACHINE LEARNING

What is bias ?

Restriction bias
limits the set of possible hypotheses;
also called language bias

Preference bias
orders available hypotheses;
also called search bias
EXAMPLES OF PREFERENCE BIAS

maximally general hypothesis

maximally specific hypothesis

minimal description length MDL

Occam's razor

maximally compressive hypothesis:
encoding_length(Hypothesis) <<
encoding_length(Examples)
OCCAM’S RAZOR

William of Occam (Ockham), 1320

Applied in experimental science as Occam’s razor
“Entities should not be multiplied unnecessarily”
Given two explanations of the data, all other
things being equal, the simpler explanation
is preferable.
CRITERIA OF SUCCESS IN MACHINE
LEARNING

Accuracy of induced hypotheses

Comprehensibility, "understandability“,
interpretability of induced hypotheses

Often in practice comprehensibility more
important than accuracy
RELEVANT OTHER DISCIPLINES

Artificial Intelligence

Statistics

Bayesian methods

Computational complexity theory

Control theory

Information theory
RELEVANT DISCIPLINES, CTD.

Cognitive science, psychology and neurobiology:
How humans learn?

Philosophy
D. Gillies, Implications of AI to philosophy of science
SOME BOOKS ON ML

The following are two excellent standard text books on ML:
Mitchell, T.M. (1997) Machine Learning. Mc Graw–Hill.
Witten, I.H., Frank, E. (2005) Data Mining: Practical Machine Learning Tools and
Techniques, 2nd edition. Elsevier.
Mitchell gives a broader coverage of learning paradigms and goes to greater
depth theoretically. Witten & Frank is more practical and goes into more detail
regarding attribute-value learning

The following book covers in a more compact manner many additional topics in
ML:
Kononenko, I., Kukar, M. (2007) Machine Learning and Data Mining:
Introduction to Principles and Algorithms. Chichester, UK: Ellis Horwood.