# Ch10 Machine Learning: Symbol-Based

Τεχνίτη Νοημοσύνη και Ρομποτική

16 Οκτ 2013 (πριν από 4 χρόνια και 6 μήνες)

82 εμφανίσεις

Ch10 Machine Learning:

Symbol
-
Based

Dr. Bernard Chen Ph.D.

University of Central Arkansas

Spring 2011

Machine Learning Outline

The book present four chapters on
machine learning, reflecting four
approaches to the problem:

Symbol Based

Connectionist

Genetic/Evolutionary

Stochastic

Ch.10 Outline

A framework for Symbol
-
Based Learning

ID3 Decision Tree

Unsupervised Learning

The Framework for Symbol
-
Based Learning

The Framework Example

Data

The representation:

Size(small)^color(red)^shape(round)

Size(large)^color(red)^shape(round)

The Framework Example

A set of operations:

Based on

Size(small)^color(red)^shape(round)

replace a single constant with a variable produces the
generalizations:

Size(X)^color(red)^shape(round)

Size(small)^color(X)^shape(round)

Size(small)^color(red)^shape(X)

The Framework Example

The concept space

The learner must search this space to find
the desired concept.

The complexity of this concept space is a
primary measure of the difficulty of a
learning problem

The Framework Example

The Framework Example

Heuristic search:

Based on

Size(small)^color(red)^shape(round)

The learner will make that example a candidate “ball” concept; this
concept correctly classifies the only positive instance

If the algorithm is given a second positive instance

Size(large)^color(red)^shape(round)

The learner may generalize the candidate “ball” concept to

Size(Y)^color(red)^shape(round)

Learning process

The training data is a series of positive
and negative examples of the concept:
examples of blocks world structures
that fit category, along with
near
misses.

The later are instances that almost
belong to the category but fail on one
property or relation

Examples and near misses for
the concept arch

Examples and near misses for
the concept arch

Examples and near misses for
the concept arch

Examples and near misses for
the concept arch

Learning process

This approach is proposed by Patrick Winston
(1975)

The program performs a hill climbing search
on the concept space guided by the training
data

Because the program does not
backtrack,
its performance is highly sensitive to the
order of the training examples

ends in the search space

Ch.10 Outline

A framework for Symbol
-
Based Learning

ID3 Decision Tree

Unsupervised Learning

ID3 Decision Tree

ID3, like candidate elimination, induces
concepts from examples

It is particularly interesting for

Its representation of learned knowledge

Its approach to the management of complexity

Its heuristic for selecting candidate concepts

Its potential for handling noisy data

ID3 Decision Tree

ID3 Decision Tree

The previous table can be represented
as the following decision tree:

ID3 Decision Tree

In a decision tree, each internal node represents a test on some property

Each possible value of that property corresponds to a branch of the tree

Leaf nodes represents classification, such as low or moderate risk

ID3 Decision Tree

A simplified decision tree for credit risk
management

ID3 Decision Tree

ID3 constructs decision trees in a top
-
down fashion.

ID3 selects a property to test at the current node of
the tree and uses this test to partition the set of
examples

The algorithm recursively constructs a sub
-
tree for
each parturition

This continues until all members of the partition are
in the same class

ID3 Decision Tree

For example, ID3 selects income as the
root property for the first step

ID3 Decision Tree

ID3 Decision Tree

How to select the 1
st

node? (and the
following nodes)

ID3 measures the information gained by
making each property the root of current
subtree

It picks the property that provides the
greatest information gain

ID3 Decision Tree

If we assume that all the examples in
the table occur with equal probability,
then:

P(risk is high)=6/14

P(risk is moderate)=3/14

P(risk is low)=5/14

ID3 Decision Tree

I[6,3,5]=

Based on

ID3 Decision Tree

ID3 Decision Tree

ID3 Decision Tree

The information gain form income is:

Gain(income)= I[6,3,5]
-
E[income]= 1.531
-
0.564=0.967

Similarly,

Gain(credit history)=0.266

Gain(debt)=0.063

Gain(colletral)=0.206

ID3 Decision Tree

Since income provides the greatest
information gain, ID3 will select it as
the root of the tree

Attribute Selection Measure:
Information Gain (ID3/C4.5)

Select the attribute with the highest
information gain

Let
pi

be the probability that an arbitrary
tuple in D belongs to class Ci, estimated by
|C
i
, D|/|D|

Expected information

(entropy) needed to

classify a tuple in D:

Attribute Selection Measure:
Information Gain (ID3/C4.5)

Information

needed (after using A to
split D into v partitions) to classify D:

Information gained

by branching on
attribute A

ID3 Decision Tree

Pseudo Code

Another

Decision Tree Example

Decision Tree

Example

Info(Tenured)=I(3,3)=

log
2
(12)=log12/log2=1.07918/0.30
103=3.584958.

Teach you what is log
2

http://www.ehow.com/how_5144933_
calculate
-
log.html

Convenient tool:
http://web2.0calc.com/

Decision Tree

Example

Info
RANK

(Tenured)=

3/6 I(1,2) + 2/6 I(1,1) + 1/6 I(1,0)=

3/6 * ( ) + 2/6 (1) +
1/6 (0)= 0.79

3/6 I(1,2) means
“Assistant Prof” has 3 out of 6 samples, with 1
yes’s and 2 no’s.

2/6 I(1,1) means
“Associate Prof” has 2 out of 6 samples, with 1
yes’s and 1 no’s.

1/6 I(1,0) means
“Professor” has 1 out of 6 samples, with 1
yes’s and 0 no’s.

Decision Tree

Example

Info
YEARS

(Tenured)=

1/6 I(1,0) + 2/6 I(0,2) + 1/6 I(0,1) + 2/6 I (2,0)= 0

1/6 I(1,0) means
“years=2” has 1 out of 6 samples, with 1 yes’s and
0 no’s.

2/6 I(0,2) means
“years=3” has 2 out of 6 samples, with 0 yes’s and
2 no’s.

1/6 I(0,1) means
“years=6” has 1 out of 6 samples, with 0 yes’s and
1 no’s.

2/6 I(2,0) means
“years=7” has 2 out of 6 samples, with 2 yes’s and
0 no’s.

Ch.10 Outline

A framework for Symbol
-
Based Learning

ID3 Decision Tree

Unsupervised Learning

Unsupervised Learning

The learning algorithms discussed so far implement
forms of
supervised learning

They assume the existence of a teacher, some fitness
measure, or other external method of classifying
training instances

Unsupervised Learning

eliminates the teacher and
requires that the learners form and evaluate concepts
their own

Unsupervised Learning

Science is perhaps the best example of
unsupervised learning in humans

Scientists do not have the benefit of a
teacher.

explain observations,

Unsupervised Learning

The clustering problem starts with (1) a
collection of unclassified objects and (2) a
means for measuring the similarity of objects

The goal is to organize the objects into
classes that meet some standard of quality,
such as maximizing the similarity of objects in
the same class

Unsupervised Learning

Numeric taxonomy is one of the oldest
approaches to the clustering problem

A reasonable similarity metric treats
each object as a point in n
-
dimensional
space

The similarity of two objects is the
Euclidean distance between them in this
space

Unsupervised Learning

Using this similarity metric, a common clustering algorithm
builds clusters in a bottom
-
up fashion, also known as
agglomerative clustering:

Examining all pairs of objects, select the pair with the
highest degree of similarity, and mark that pair a cluster

Defining the features of the cluster as some function (such
as average) of the features of the component members and
then replacing the component objects with this cluster
definition

Repeat this process on the collection of objects until all
objects have been reduced to a single cluster

Unsupervised Learning

The result of this algorithm is a
Binary
Tree

whose leaf nodes are instances
and whose internal nodes are clusters
of increasing size

We may also extend this algorithm to
objects represented as sets of symbolic
features.

Unsupervised Learning

Object1={small, red, rubber, ball}

Object1={small, blue, rubber, ball}

Object1={large, black, wooden, ball}

This metric would compute the similary
values:

Similarity(object1, object2)= ¾

Similarity(object1, object3)=1/4

Partitioning Algorithms: Basic
Concept

Given a
k
, find a partition of
k clusters
that optimizes the
chosen partitioning criterion

Global optimal: exhaustively enumerate all partitions

Heuristic methods:
k
-
means

and
k
-
medoids

algorithms

k
-
means

(MacQueen’67): Each cluster is represented by the center of
the cluster

k
-
medoids

or PAM (Partition around medoids) (Kaufman &
Rousseeuw’87): Each cluster is represented by one of the objects in the
cluster

The
K
-
Means

Clustering
Method

Given
k
, the
k
-
means

algorithm is
implemented in four steps:

1.
Partition objects into
k

nonempty subsets

2.
Compute seed points as the centroids of the clusters
of the current partition (the centroid is the center,
i.e.,
mean point
, of the cluster)

3.
Assign each object to the cluster with the nearest
seed point

4.
Go back to Step 2, stop when no more new
assignment

K
-
means Clustering

K
-
means Clustering

K
-
means Clustering

K
-
means Clustering

K
-
means Clustering

The
K
-
Means

Clustering
Method

0

1

2

3

4

5

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

10

K=2

Arbitrarily choose
K object as initial
cluster center

Assign
each
objects
to
most
similar
center

Update
the
cluster
means

Update
the
cluster
means

reassign

reassign

Example

Run K
-
means clustering with 3 clusters
(initial centroids: 3, 16, 25) for at least
2 iterations

Example

Centroids:

3

2 3 4 7 9 new centroid: 5

16

10 11 12 16 18 19 new centroid: 14.33

25

23 24 25 30 new centroid: 25.5

Example

Centroids:

5

2 3 4 7 9 new centroid: 5

14.33

10 11 12 16 18 19 new centroid: 14.33

25.5

23 24 25 30 new centroid: 25.5

In class Practice

Run K
-
means clustering with 3 clusters
(initial centroids: 3, 12, 19) for at least
2 iterations