A first introduction to the world of A first introduction to the world of machine learning. machine learning. - -how it relates to KDDM how it relates to KDDM

milkygoodyearAI and Robotics

Oct 14, 2013 (3 years and 8 months ago)


A first introduction to the world of
A first introduction to the world of
machine learning.
machine learning.
how it relates to KDDM
how it relates to KDDM
©Vladimir Estivill-Castro
School of Computing and Information Technology
© Vladimir Estivill -Castro 2
Chapter 1: “Computer Systems that learn ..” Weiss and
Chapter XIV of the Handbook of AI
“Data Mining -Practical Machnie Learning Tools and Techniques
with JAVA implementations” Ian H. Witten and Eibe Frank,
Morgan Kaufmman (200)
© Vladimir Estivill -Castro 3
Outline of Machine Learning
1.Machine Learning for classification and Bayes
2.Inductive (symbolic)Learning
3.Explanation-based Learning
1.Inductive Logic Programming
4.Genetic Algorithms
5.Neural Networks
6.Support Vector machines
7.Unsupervised Learning
© Vladimir Estivill -Castro 4
What is Learning

Any process by which a system improves its
• Distinctive characteristic of intelligence.
•Skill acquisition.
• Theory formation, hypothesis formation and
inductive inference.
© Vladimir Estivill -Castro 5
A model of Learning
•Two declarative bodies of information:
Environment and Knowledge Base.
•Two procedures: Learning Element
Performance Element.
© Vladimir Estivill -Castro 6
The Task
The task of the learning element can be
viewed as the task of bridging the gap
between the level at which the information
is provided by the environment and the level
at which the performance element can use
the information to carry out its functions.
© Vladimir Estivill -Castro 7
Taxonomy of learning situations
• Rote Learning, in which the environment
provides information at exactly the same
level of the performance task and, thus,
no hypothesis needed.
© Vladimir Estivill -Castro 8
Taxonomy of learning situations
• Learning from examples,in which the information
provided by the environment is too specific and
detailed and, thus, the learning element must hyp-
othesize more general rules.
• Learning by analogy,in which the information pr-
ovided by the environment is relevant only to an
analogous performance task , and thus , the learn-
ing system must discover the analogy and hypoth-
esize analogous rules for its present performance
© Vladimir Estivill -Castro 9
Factor affecting a learning system
• The environment
• The Knowledge Base:
1. Representation tools:
feature vectors and predicate calculus or
2. Expressiveness
3. Ease of inference
4. Modifiability of knowledge base:
5. Extendibility
© Vladimir Estivill -Castro 10
The Performance Element
1. Simples performance task is
•Using so-called connectives, we can build
complex sentences.complex sentences.
© Vladimir Estivill -Castro 11
Learning from examples - introduction
• Examples can be viewed as being pieces of
very specific knowledge that cannot be used
effectively by the performance element. Th-
ese are transformed into more general, higher
level pieces of knowledge that can be used
© Vladimir Estivill -Castro 12
The Knowledge Base System
• The Knowledge Base
• The Inference Engine
© Vladimir Estivill -Castro 13
Knowledge Engineering Bottleneck
• Knowledge Acquisition:
To build the knowledge base.
• Knowledge Elicitation:
To make the knowledge explicit.
• Knowledge maintenance:
To update and revise the knowledge base.
© Vladimir Estivill -Castro 14
Machine Learning techniques are mature
• Machine Learning systems used in many ind-
- Voice recognition
- Credit Assignment
- Satellite-image classification
- Knowledge discovery in Data Bases (Data
© Vladimir Estivill -Castro 15
• Most well-studied machine learning task.
• TASK: Find the class for which an example
(instance) belong to.
- Classification procedure: A formal method to
repeatedly classify new examples of instances.
- Machine Learning builds Classification Procedures
- Machine Learning:A program that builds a program.
- Input to machine learning program: old examples
- Also learning from examples, inductive learning.
© Vladimir Estivill -Castro 16
• Supervised learning.
- Old examples have been
already classified.
• Unsupervised learning
- Find cluster, classes of groups.
© Vladimir Estivill -Castro 17
Classification Examples
• Assign a letter to its destiny (recognize hand-
written text).
• Diagnose the illness of a patient from some
• Identify the cause of a failure in a machine
from observing unusual behavior.
• Indicate the procedure to follow when a fact-
ory is away from its normal operation.
• Indicate what decision to make given the data
of the current situation.
© Vladimir Estivill -Castro 18
Classification (Approaches)
• Based on statistics (discriminants)
• Artificial Intelligence (decision rules)
• Mathematical structures (Rough Sets)
• Information Theory (Entropy Discretization)
• Natural analogy: Genetic algorithms and
Neural Networks.
© Vladimir Estivill -Castro 19
Classification Goals
• Match, if not improve, the human capacity
for decision making, but overpass in
• Make the classification process explicit.
• Manage a large variety of problems (lack
of experts).
© Vladimir Estivill -Castro 20
• Efficiency (improve the speed at which mail
is classified).
• Avoid bias (humans then to pre-judge).
• Avoid expensive procedures (accurate
diagnosis before surgery).
• The supervisor may be the verdict of history.
© Vladimir Estivill -Castro 21
Factors in a classification procedure
• Precision. The error-rate on classification.
• Speed. The time it takes to make a decision.
• Clarity. The explanation for the classification.
• Learning speed. The time it takes to build or
update the classifier.
© Vladimir Estivill -Castro 22
• What are Knowledge Based Systems?
• What applications of Knowledge Based
Systems have you heard of?
• What have you heard about machine
• What links have you seem between
Knowledge Discovery and Machine Learning
© Vladimir Estivill -Castro 23
First introduction
to symbolic approaches into
inductive learning.
Inductive Learning
To learn from examples a general concept.
In the model of learning, the environment
provides more specific information than it is
in the knowledge base to be used by the
performance element
© Vladimir Estivill -Castro 24
Inductive symbolic
Inductive symbolic
learning of concepts
learning of concepts
A concept is a logic rule for classification that
divides the domain of possible examples into
those that fulfill the rule and those that do not.
 Example:
``x is prime if and only if x is an integer and x is
divisible only by itself or 1''.
A classification rule is a concise description
(symbolic, i.e. in a formal language) of the
examples that belong to a concept.
© Vladimir Estivill -Castro 25
Concepts as sets
Concepts as sets
This is a powerful tool.
Expressions in the formal language are matched to
a set in the domain.
The meaning of a sentence in the language is the
associated subset of the domain.
Exactly those instances that make the sentence
true (model).
Assumption: A concept partitions the domain into
the concept and its complement.
© Vladimir Estivill -Castro 26
Scenario for
Scenario for
inductive learning
inductive learning
of concepts
of concepts
Tutor presents (labeled) examples and counter-
examples to the learning element.
It is possible that only positive examples are presented.
It is possible that only negative examples are presented.
 GOAL: A logic rule that generalizes all the
positive examples and excludes as best possible
the negative examples.
A consistent rule with some data set is a rule that
has no apparent error.
© Vladimir Estivill -Castro 27
Inductive learning
Inductive learning
as a search problem
as a search problem
In the space of all classification rules, find
the best classification rule.
Analogous to linear discriminants:
In the space of all linear discriminants find the
one that minimizes the truth error rate.
© Vladimir Estivill -Castro 28
Unified framework
Unified framework
System is guided by data.
Has a representation of knowledge as a language.
Expressions in the language have a subset of the
domain as their meaning.
Under this conditions, the sentences of the logic
language can be provided with the structure of a partial
order according to the level of generality of the
The relation is more general than between
concepts (or equivalently, logic expressions).
© Vladimir Estivill -Castro 29
The search space
The search space
The set of all expression of the logic language.
The most general point in the search space is the
empty expression.
we will assume that the empty expression is true of
every element in the universe.
The most specific points in the search space are logic
rules characterizing only one element.
The set Hof possible hypothesis can be
represented compactly using the partial order
© Vladimir Estivill -Castro 30
How does the learner choose among all
(and valid) hypothesis.
The evidence provided by the tutor should
guide the search.
BIAS: How does the learner give preference
to some valid hypothesis over the other
valid hypothesis.
© Vladimir Estivill -Castro 31
Version spaces
Version spaces
 Introduced by Mitchell, the technique goes a
1.Perform the least committed revision of the frontier of Heach time
the tutor presents and example or a counter example.
2.A hypothesis remains valid as long as it has not been contradicted by
the evidence.
 The initial version of H is the complete space of
logic rules.
 If all examples and counter-examples are
presented and H has only one hypothesis, this is
the concept to be learned.
 The more accurate versions of H are the version
spaces for the goal concept to be learned.
© Vladimir Estivill -Castro 32
Decision trees:
Decision trees:
Supervised learning.
Goal is classification.
Input is in attribute-vector format.
Classical versions: attributes have a small number
of possible discrete values
CLS (Concept Learning System) Hunt[1966],
ID3 and C4.5 Quinlan[1987].
Decision trees are an encoding of logic rules.
© Vladimir Estivill -Castro 33
How do they
How do they
Start at the root of the tree.
A node is a question regarding one attribute.
The results indicates the sub-node of the
tree that must be visited next.
Continue down until a leaf is reached.
Leaves are labeled with a class.
© Vladimir Estivill -Castro 34
Decision trees
Decision trees
Naturally represent concepts in disjunctive form.
 There may be many trees representing the same concept.
 difficult to validate if concepts are equivalent.
 The model fits the concept with layers of boxes with sides parallel
to the axes.
 does not fit well diagonal concepts; example XOR.
Representation is slightly more restrictive than
first term must be common attribute; then
mutually exclusive terms.
© Vladimir Estivill -Castro 35
How is the tree built?
How is the tree built?
1.Start with an tree with one node holding all the examples
and gradually expand nodes as follows.
2.Make a leaf if homogenous node
1.If all examples in a node are positive, make it a leaf
node and label it as YES.
2.If all examples in a node are negative, make it a leaf
node and label it as NO.
3.If heterogeneous node, partition the training set C in
subsets C
,…,Ct by the possible values of an attribute A
and send each subset C
to the node in the corresponding
4.After this examples in each child node are homogeneous
in their values for attribute A.
5.Apply the algorithm recursively to the subsets C
© Vladimir Estivill -Castro 36
Criteria to select
Criteria to select
the attribute A
the attribute A
Ideally, a method to find the shortest tree.
It is a combinatorially difficult problem that
must be solved heuristically.
Select the attribute that discriminates
(homogenizes) the most between positive
and negative examples.
Many possibilities have been suggested and
still new proposals emerge.
© Vladimir Estivill -Castro 37
Criteria to select
Criteria to select
the attribute A
the attribute A
Common criteria after Quinlan is based on
Information Theory.
The attribute that reduces the entropy the
most (the entropy is a measure of
The entropy is estimated by estimating
probabilities by maximum likelihood in the
set C.
© Vladimir Estivill -Castro 38
Split formula
Split formula
Formula of the value for attribute A with
possible values v
,…, v
© Vladimir Estivill -Castro 39
Symbolic learning:
Symbolic learning:
Motivated by the goal to represent
knowledge in formats that are easily
understood and more compatible with
human reasoning.
© Vladimir Estivill -Castro 40
Problems with Decision Trees
Problems with Decision Trees
Gap between the apparent error rate (in training)
and the true error rate
Managing of continuous attributes
split of the attribute (not possible by values, but by
intervals, then which intervals).
Many branches and small sub-trees
Heuristic for best tree (hill-climber)
Form of rules vs. DNF, nearly accurate rules.