Learning in AI

Arya MirΤεχνίτη Νοημοσύνη και Ρομποτική

26 Νοε 2011 (πριν από 5 χρόνια και 8 μήνες)

800 εμφανίσεις

Learning in AI
Srinandan Dasmahapatra
May 2, 2006
Rolf Pfeifer
(Chris
Langton
)
In the context of A-life
Learning for Design:
because the world cannot be programmed into the machine
Knowledge Representation:
The Requirement
Learning: General Remarks
Learning: General Remarks
Learning:
the tasks and some examples
Bayesian Supervised Learning
Probabilities from training set
Labelled
examples (
c
i
) allow you to pick out those for each label
Then count how many there are in each
labelled
class:
Example training set:
On which days would you want to play tennis?
Naïve Bayes Classifier
Why
naïve?
Decision Trees for Classification

Equivalent forms of answering
the question

When at least one of the
following three conditions are
satisfied:
1. Outlook =
overcast
2. Outlook =
rain
AND Wind =
weak
3. Outlook =
sunny
AND Humidity =
normal
Learning Disjunctive Normal Forms (DNF)
Learning Decision Trees
(Supervised Learning)

How can one learn such trees from data alone?
Basic Algorithm for Growing a Decision Tree
In order to decide
which attribute is
the best, the
attribute which
produces the best
discrimination is
chosen
In algorithm ID3 and
variants, the
attribute chosen is
the most
informative (in the
technical sense of
information theory)
Find best attribute (colour, shape)
to sift

+

from

-

examples
Split on Shape Attribute
Split on Colour Attribute
Which attribute?
shape
colour
A little bit of information theory
Measuring Information
Introducing Entropy
Set p
H
=p and p
T
=(1-p)
Entropy and Information
The purity of a node
shape
colour
Looks quite pure!
Measuring impurity
More generally, the impurity of a node
for a non-binary split, is given on the right
Splitting Criterion: Information Gain
Calculating Information Gain
Example training set:
On which days would you want to play tennis?
Choosing attributes to split on
Careful: E is used for entropy on this slide instead of H!
Other impurity measures:
Gini
, Misclassification
Other Splitting Criteria:
Information Gain Ratio
Hypothesis Search in ID3
Comments on ID3

Decision tree might classify training set very
well, yet could do poorly on unseen test data

Overfitting

Could be true of any learning method
Preventing
Overfitting
Growing the tree as far down as we can recursively will often result in
overfitting
.
The simplest solution is to change the test on the base case to be a
threshold on the entropy. If the entropy is below some chosen value,
we decide that this leaf is close enough to pure.
Another simple solution is to have a threshold on the size of your
leaves; if the data set at some leaf has fewer than that number of
elements, then don't split it further.
Another possible method is to only split if the split represents a real
improvement. We can compare the entropy at the current node to
the average entropy for the best attribute. If the entropy is not
significantly decreased, then we could just give up.
Bertrand Russell on Induction
If asked why we believe the sun will rise tomorrow, we shall naturally
answer, 'Because it has always risen every day.' We have a firm
belief
that it will rise in the
future
, because it has risen in the
past
.
The real question is: Do any number of cases of a
law
being fulfilled in the
past
afford
evidence
that it will be fulfilled in the
future
?
It has been argued that we have reason to know the future will resemble the
past, because what was
the future has constantly become the past
, and
has always been found to
resemble
the past, so that we really have
experience
of the future, namely of times which were formerly future,
which we may call
past futures
.
But such an argument really begs the very question at issue. We have
experience of past futures, but not of future futures, and the question is:
Will future futures resemble past futures?
Cross-validation:
Managing Past Futures

Partition training set of k*N examples into k partitions of
N each

Train on k sets of size (k-1)*N and test on the remaining
N; then average over these k results

For k=3, let S
1
, S
2
and S
3
be the partitions;
train on S
1
, S
2
and test on S
3
train on S
1
, S
3
and test on S
2
train on S
2
, S
3
and test on S
1
choose the classifier which has best average result