Introduction to Machine Learning (4h: GB) - Membres du LIG

milkygoodyearAI and Robotics

Oct 14, 2013 (3 years and 10 months ago)

101 views

Learning is
acquiring
new knowledge, behaviors, skills and may involve
synthesizing different types of information. Learning may occur as a
result of habituation or classical conditioning or as a result of more
complex activities such as play or studies.
« Learning is constructing or modifying
representations
of what is being
experienced
» (R. Michalski)
« Learning aim at increasing the
performances
of a system on a
given
task
by using a set of
experiments
»
(J. Mitchell)
Tâche
Apprentissage
M
Données
Apprentissage
M
Tâche
On line learning is needed
when a system must be able
to adapt its behavior «rapidly»
Batch learning can be used to
analyse an existing dataset in
order to generate a model
Supervised learning or
discrimination
Unsupervised learning or
clustering
Case based reasoning
Reinforcement learning
Apprentissage
M
?
Apprentissage
M
SE
Concept
Descriptions
Environment
Learning
Meta learning
Predict
Interact
Environment
2
Learning
Weight
Live Span
E1
3,5
12
E2
4
15
E3
5,2
11
4.2±0.9
12.7±2.1
Statistics is the study of how to collect, represent, analyze, explain ... datasets
Acquisition ≠ Transfert
Acquisition = Model
A-Cycles
Mass
Ph
Carboxyl
Activity
M1
1
low
<5
false
null
M2
2
mean
<5
true
toxic
M3
0
mean
>8
true
toxic
M4
0
mean
<5
false
null
M5
1
heavy
~7
false
null
M6
2
heavy
>8
false
toxic
M7
1
heavy
>8
false
toxic
M8
0
low
<5
true
toxic
M2, M8
M1, M4
PH
null
Carboxyl
<5
toxic
null
toxic
false
true
~7
>8
M5
M3, M6, M7
Environment
Objectives
Model
Kind of data:

Labeled: S = {(x
i
, u
i
) ...}

Unlabeled: S = {x
i
, ...}
Type of the data:

Numerical

Complex

Sequence,

Graphs, ...
Availability of the data:

Database (batch learning)

Incremental (on line learning)

Selectable (active learning)
Classification:
u
i
=
h
(x
i
)

Discrimination : h discrete

Ranking : h ordered

Regression : h continuous
Discovery:

Clustering :
h
(x
i
)→C
j

Partition,

Hierarchy, ...

Association rules

Grammatical inferences, ...
Optimization:

Reinforcement learning.

Planning ...
«Symbolical»
Focus on
understandability

Decision tree,

Horn clause ,

Semantic network,

...
«
Numerical»
Focus on
efficiency

Hyperplanes parameters ,

Neural network,

Bayesian network,

...
5) Tuning of the input
(revision step)
Empirical ≠ Semantical

What’s a «good» model ?
Astrology
« Sirius »
Ptolemy model
Copernicus
model
Kepler Laws
Newton’s theory
Titus/Bode (law)
d=0,4 + (0,3x2
n
)
Plate tectonics
Darwinian’s
theory
n
-body problem
Balmer Law
Quantum
mechanics
entia non sunt multiplicanda praeter necessitatem
«Entities should not be multiplied unnecessarily»
In ML as in the rest of Computer Science: «Garbage Input, Garbage Output»
I
II
III
IV
X
Y
X
Y
X
Y
X
Y
10
8,04
10
9,14
10
7,46
8
6,58
8
6,95
8
8,14
8
6,77
8
5,76
13
7,58
13
8,74
13
12,74
8
7,71
9
8,81
9
8,77
9
7,11
8
8,84
11
8,33
11
9,26
11
7,81
8
8,47
14
9,96
14
8,1
14
8,84
8
7,04
6
7,24
6
6,13
6
6,08
8
5,25
4
4,26
4
3,1
4
5,39
19
12,5
12
10,84
12
9,13
12
8,15
8
5,56
7
4,82
7
7,26
7
6,42
8
7,91
5
5,68
5
4,74
5
5,73
8
6,89
Statictic
Value
Mean of X
9
Variance of X
11
Mean of Y
~7.50
Variance of Y
~4.25
Correlation between
X and Y
0.816
Linear Regression
Y=0.5X+3
Vectorial data

Relational data




Table
Rows are instances
Columns are variables (attributes)
Propositional logic
(conjunction, disjunction of
attributes)
Vector of parameters
Hyper-planes
Probabilities, ...
Rules
(Knowledge based systems)
Graphs
Predicative logic
Graphs
Predicative logic
Conceptual Graphs
Horn Clauses
Vectorial data
Relational data
N
bond (m1, c1, Cl, simple), bond (m1,
c1,c2, single), (m1)
S
mass=167

number_cycle=1


contain_Br=no

...
bond (m1, c1, Cl, simple), bond (m1,
c1,c2, single), (m1)
N
Vector of parameters
mutagenic (M)
:- bond (M, Atom1,
Atom2, double), has_ring (M, R, 5),
bond (M, R, Atom1, single), is (Atom1,
Br), …
S
IF (mass<500)

(LogP> 5)
∧

THEN (potential_drug = vrai)
mutagenic (M)
:- bond (M, Atom1,
Atom2, double), has_ring (M, R, 5),
bond (M, R, Atom1, single), is (Atom1,
Br), …
A: Temperature
B: Dryness
Survival
Plant 1
2
2,4
+
Plant 2
4
3,5
-
Plant 3
8
1
+
Plant 4
8
7
-
•••
•••
•••
•••
Plant 19
3
9,5
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
-
+
+
A:
T
emp
B: Dry
1
2
3
4
5
6
7
8
9
O
1
2
3
4
5
6
7
8
9
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
-
+
+
A:
T
emp
B: Dry
1
2
3
4
5
6
7
8
9
O
1
2
3
4
5
6
7
8
9
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
-
+
+
A:
T
emp
B: Dry
1
2
3
4
5
6
7
8
9
O
1
2
3
4
5
6
7
8
9
Accurate
Plausible
-
h must recognize the positive examples
-
h must reject the negative examples
-
h must be general (few conjunctions)
-
h must be simple (few disjunctions)
X
H
-
+
-
-
-
-
-
+
+
+
+
+
-
+
-
+
+
+
+
+
+
+
C
h
i
2
N

1
2
2
N

1
X
H
-
+
-
-
-
-
-
+
+
+
+
+
-
+
C
-
h
i
h'
j
Quality
SubOptimal
Optimal
Quality
time
T
Quality
67 %
33 %
Learning
Test
2 Parts
3 Parts
Role
Training set
66 %
50 %
Used to learn the model
Validation set
-
25 %
Help to tune the learning parameters
(pruning, stability)
Test set
33 %
25 %
Measure the accuracy of the model
M
i
M
i
M
i
N1
N2
N3
N4
N5
Real label
Real label
class=+
class=-
Predicted
label
class=+
A
(True positives)
B
(False positives)
Predicted
label
class=-
C
(False negatives)
D
(True Negatives)
Re
c
ogni
t
i
on r
a
t
e

A

D
A

B

C

D

(
1

e
r
r
or
 
r
at
e
)
IR domain
Medical domain
Probability that a test result
will be positive when the
disease is present
Probability that a test result
will be negative when the
disease is not present
S
e
ns
i
t
i
vi
t
y=
A
A

C
S
pe
c
i
f
i
c
i
t
y=
D
B

D
P
r
e
c
i
s
i
on=
A
A

B
Re
c
a
l
l
=
A
A

C
60,0 %
70,0 %
80,0 %
90,0 %
100,0 %
1000
2000
3000
4000
Error
rate
Complexity of Lh increasing
Learning error
Generalization
error
A
B
1
2
3
4
5
6
7
8
9
O
1
2
3
4
5
6
7
8
9
L
H1
L
H2
L
H3
Machine learning
Research