Lecture 10 & 11. Machine learning

journeycartΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

59 εμφανίσεις

Artificial Intelligence

University Politehnica of Bucharest

2008
-
2009

Adina Magda Florea

http://turing.cs.pub.ro/aifils_08

Course No. 10, 11

Machine learning


Types of learning


Learning by decision trees


Learning disjunctive concepts


Learning in version space



2

1. Types of learning


Specific inferences


Inductive inference




Abductive inference





Analogical inference

Uda(iarba)

(



(PlouaPeste(x)



愨砩)

Learning system


Learning


Process


Problem Solving


K & B




Inferences


Strategy


Performance


Evaluation

Learning results

Results


Environment

Feed
-
back

Teacher

Feed
-
back

Data

General structure of a learning system


Learning through memorization


Learning through instruction /
operationalization


Learning through induction (from
examples)


Learning through analogy

Types of learning

2. Decision trees. ID3 algorithm


Inductive learning


Learns concept descriptions from examples


Examples (instances of concepts) are
defined by attributes and classified in
classes


Concepts are represented as a decision tree
in which every level of the tree is associated
to an attribute


The leafs are labeled with concepts


Building and using the decision tree


First build the decision tree from examples


Label leaves with YES or NO (one class) or
with the class (Ci)


Unknown instances are then classified by
following a path in the decision tree
according to the values of the attributes

Example

Example

No.

Risk (Classification)

Credit History

Debt

Collateral

Income



1

High


Bad


High

None

$0 to $15k


2

High


Unknown


High

None

$15 to $35k

3

Moderate


Unknown


Low

None

$15 to $35k

4

High


Unknown


Low

None

$0k to $15k

5

Low


Unknown


Low

None

Over $35k


6

Low


Unknown


Low

Adequate

Over $35k


7

High


Bad


Low

None

$0 to $15k


8

Moderate


Bad


Low

Adequate

Over $35k


9

Low


Good


Low

None

Over $35k


10

Low


Good


High

Adequate

Over $35k


11

High


Good


High

None

$0 to $15k


12

Moderate


Good


High

None

$15 to $35k

13

Low


Good


High

None

Over $35k


14

High


Bad


High

None

$15 to $35k



Another example: Credit evaluation

Algorithm for building the decision tree

func

tree (ex_set, attributes, default)


1.
if

ex_set = empty
then

return

a leaf labeled with default


2.
if

all examples in ex_set are in the same class
then

return

a leaf labeled
with that class


3.
if

attributes = empty



then

return

a leaf labeled with the disjunction of classes in ex_set



4. Select an attribute A, create a node for A and labeled the node with A


-

remove A from attributes

> attributes’


-

m = majority (ex_set)


-
for

each value V of A

repeat




-

be partitionV the set of examples from ex_set with value V



for A



-

create nodeV = tree (partitionV, attributes’,m)



-

create link node A
-

nodeV




and label the link with V

e
nd

Remarks


Different decision trees


Depth of different DTs is different


Occam's razor: build the simplest tree


Information theory


Universe of messages


M =
{
m
1
, m
2
, ..., m
n
}







and a probability
p
(
m
i
) of occurrence of every
message in M, the information content of M can
be defined as:

Information content I(T)


p(risk is high) = 6/14


p(risk is moderate) = 3/14


p(risk is low) = 5/14



The information content of the decision tree is:





I(Arb) =
6/14log(6/14)+3/14log(3/14)+5/14log(5/14)

Information gain G(A)


For an attribute
A
, the information gain
obtained by selecting this attribute as the
root of the tree equals the total information
content of the tree minus the information
content that is necessary to finish the
classification (building the tree), after
selecting A as root





G(A) = I(Arb)
-

E(A)

Computing E(A)


Set

of

learning

examples

C


Attribute

A

with

n

values

in

the

root

-
>

C

divided

in




{
C
1
,

C
2
,

...
,

C
n
}



Example


“Income” as root:


C
1

=
{
1, 4, 7, 11
}


C
2

=
{
2, 3, 12, 14
}


C
3

=
{
5, 6, 8, 9, 10, 13
}

G(income) = I(Arb)
-

E(Income) =

1,531
-

0,564 =
0,967 bits

G(credit history) = 0,266 bits

G(debt) = 0,581 bits

G(collateral) = 0,756 bits

Learning performance


Be S the set of learning examples


Divide S in the learning set and the training set


Apply ID3


How many examples from the training set are
correctly classified?


Repeat steps above for different LS and TS


Obtain a prediction of the learning performance


Graph X
-

size of LS, Y
-

percentage of correctly
classified examples


Happy graphs


Remarks


Lack of data


Attributes with many values and high
information gain


Attributes with numerical values


Decision rules



3. Learning by clustering


Generalization

and

specialization

Learning

examples

1
.

(yellow

brick

nice

big

+)

2
.

(blue

ball

nice

small

+)

3
.

(yellow

brick

dull

small

+)

4
.

(green

ball

dull

big

+)

5
.

(yellow

cube

nice

big

+)

6
.

(blue

cube

nice

small

-
)

7
.

(blue

brick

nice

big

-
)

21

Learning by clustering

concept

name
:

NAME

positive

part



cluster
:

description
:

(yellow

brick

nice

big)





ex
:

1

negative

part



ex
:

concept

name
:

NAME

positive

part



cluster
:

description
:

(

_

_

nice

_)





ex
:

1
,

2

negative

part



ex
:

22

1. (yellow brick nice big +)

2. (blue ball nice small +)

3. (yellow brick dull small +)

4. (green ball dull big +)

5. (yellow cube nice big +)

6. (blue cube nice small
-
)

7. (blue brick nice big
-
)

Learning by clustering

concept

name
:

NAME

positive

part



cluster
:

description
:

(

_

_

_

_)





ex
:

1
,

2
,

3
,

4
,

5

negative

part



ex
:

6
,

7

23

over generalization

1. (yellow brick nice big +)

2. (blue ball nice small +)

3. (yellow brick dull small +)

4. (green ball dull big +)

5. (yellow cube nice big +)

6. (blue cube nice small
-
)

7. (blue brick nice big
-
)

Learning by clustering

concept

name
:

NAME

positive

part



cluster
:

description
:

(yellow

brick

nice

big)





ex
:

1



cluster
:

description
:

(

blue

ball

nice

small)





ex
:

2

negative

part



ex
:

6
,

7

24

1. (yellow brick nice big +)

2. (blue ball nice small +)

3. (yellow brick dull small +)

4. (green ball dull big +)

5. (yellow cube nice big +)

6. (blue cube nice small
-
)

7. (blue brick nice big
-
)

Learning by clustering

concept

name
:

NAME

positive

part



cluster
:

description
:

(

yellow

brick

_

_)





ex
:

1
,

3



cluster
:

description
:

(

_

ball

_

_)





ex
:

2
,

4

negative

part



ex
:

6
,

7

25

1. (yellow brick nice big +)

2. (blue ball nice small +)

3. (yellow brick dull small +)

4. (green ball dull big +)

5. (yellow cube nice big +)

6. (blue cube nice small
-
)

7. (blue brick nice big
-
)

Learning by clustering

concept

name
:

NAME

positive

part



cluster
:

description
:

(

yellow

_

_

_)





ex
:

1
,

3
,

5



cluster
:

description
:

(

_

ball

_

_)





ex
:

2
,

4

negative

part



ex
:

6
,

7

26

1. (yellow brick nice big +)

2. (blue ball nice small +)

3. (yellow brick dull small +)

4. (green ball dull big +)

5. (yellow cube nice big +)

6. (blue cube nice small
-
)

7. (blue brick nice big
-
)

A

if
yellow

or
ball

Learning

by

clustering

algorithm

1
.

Be

S

the

set

of

examples

2
.

Create

PP

and

NP

3
.

Add

all

ex
-

from

S

in

NP

and

remove

ex
-

from

S

4
.

Create

a

cluster

in

PP

and

add

first

ex+

5
.

S

=

S



ex+

6
.

for

every

ex+

in

S

e
i

repeat



6
.
1

for

every

cluster

C
i

repeat

-

Create

description

e
i

+

C
i

-

if

description

covers

no

ex
-


then

add

e
i

to

C
i



6
.
2

if

e
i

has

not

been

added

to

any

cluster



then

create

a

new

cluster

with

e
i

end

27

4. Learning in version space

Generalization

operators

in

version

space


Replace

constants

with

variables

color(ball,

red)

color(X,

red)


Remove

literals

from

conjunctions

shape(X,

round)



size(X,

small)



color(X,

red)

shape(X,

round)



color(X,

red)


Add

disjunctions

shape(X,

round)



size(X,

small)



color(X,

red)

shape(X,

round)



size(X,

small)



(color(X,

red)



color(X,

blue))


Replace

an

class

with

the

superclass

in

is
-
a

relations

is
-
a(tom,

cat)

is
-
a(tom,

animal)

28

Candidate elimination algorithm


Version

space

=

the

set

of

concept

descriptions

which

are

consistent

with

the

learning

examples


What

is

the

idea?

=

reduce

the

version

space

based

on

learning

examples


1

algorithm



from

specific

to

general


1

algorithm



from

general

to

specific


1

algorithm



bidirectional

search

=

candidate

elimination

algorithm

29

Candidate elimination algorithm

30

obj(X, Y, Z)

obj(X, Y, ball)

obj(X, red, Z)

obj(small, Y, Z)

obj(X, red, ball)

obj(small, Y, ball)

obj(small, red, ball)

obj(small, red, Z)

obj(small, orange, ball)

Generalization and specialization


P and Q


the set which unify with p and q in FOPL


p is more general than q

if and only if






P


Q




color(X,red)


color(ball,red)


p more genarl than q


-

p


q



x p(x)


positive(x)



x q(x)


positive(x)


p covers q
if and only if:


q
(x)


灯p楴楶攨砩楳⁡汯杩捡c捯c獥煵敮q攠潦









瀨p⤠


灯p楴楶攨砩


Concept space

obj(X,Y,Z)

31

Generalization and specialization


A

concept

c

is

maximally

specific

if

it

covers

all

ex+,

does

not

cover

any

ex
-

and

for


c’

which

covers

all

ex+,

c



c’
.

-

S


A

concept

c

is

maximally

general

if

it

does

not

cover

any

ex
-

and

for


c’

which

does

not

cover

any

ex
-
,

c



c’
.

-

G


S



set

of

hypothesis

(candidate

concepts)

=

maximum

specific

generalizations

G



set

of

hypothesis

(candidate

concepts)

=

maximum

general

specializations

32

Algorithm for searching from specific to general

1
.

Initialize

S

with

the

first

ex+

2
.

Initialize

N

with

the

empty

set

3
.

for

every

learning

example

repeat


3
.
1

if

ex+,

p
,

then




for

each

s



S

repeat



-

if

s

does

not

cover

p

then

replace

s

with

the

most

specific

generalization

which

covers

p



-

Remove

from

S

all

hypothesis

more

general

than

other

hypothesis

from

S



-

Remove

from

S

all

hypothesis

which

cover

an

ex
-

from

N


3
.
2

if

ex
-
,

n
,

then



-

Remove

from

S

all

hypothesis

which

cover

n



-

Add

n

to

N

(to

check

for

overgeneralization)

end

33

Algorithm for searching from specific to general

34

Positive: obj(small, red, ball)

Positive: obj(small, white, ball)

Positive: obj(large, blue, ball)

S: { }

S: { obj(small, red, ball) }

S: { obj(small, Y, ball) }

S: { obj(X, Y, ball) }

Algorithm for searching from general to specific

1
.

Initialize

G

with

the

most

general

description

2
.

Initialize

P

with

the

empty

set

3
.

for

every

learning

example

repeat


3
.
1

if

ex
-
,

n
,

then




for

each

g



G

repeat



-

if

g

covers

n

then

replace

g

with

the

most

general

specialization

which

does

not

cover

n



-

Remove

from

G

all

the

hypothesis

more

specific

than

other

hypothesis

in

G



-

Remove

from

G

all

hypothesis

which

does

not

cover

the

positive

examples

from

P


3
.
2

if

ex+,

p
,

then



-

Remove

from

G

all

the

hypothesis

that

does

not

cover

p



-

Add

p

to

P

(to

check

for

overspecialization)

end

35

Algorithm for searching from general to specific

36

Negative: obj(small, red, brick)

Positive: obj(large, white, ball)

Negative: obj(large, blue, cube)

G: { obj(X, Y, Z) }

G: { obj(large, Y, Z), obj(X, white, Z),

obj(X, blue, Z), obj(X, Y, ball), obj(X, Y, cube) }

Positive: obj(small, blue, ball)

G: { obj(large, Y, Z), obj(X, white, Z),

obj(X, Y, ball) }

G: {obj(X, white, Z),

obj(X, Y, ball) }

G: obj(X, Y, ball)

Algorithm for searching in version space

1
.

Initialize

G

with

the

most

general

description

2
.

Initialize

S

with

the

first

ex+

3
.

for

every

learning

example

repeat


3
.
1

if

ex+,

p
,

then



3
.
1
.
1

Remove

from

G

all

the

elements

that

does

not

cover

p



3
.
1
.
2

for

each

s



S

repeat



-

if

s

does

not

cover

p

then

replace

s

with

the

most

specific

generalization

which

covers

p



-

Remove

from

S

all

hypothesis

more

general

than

other

hypothesis

in

S



-

Remove

from

S

all

hypothesis

more

general

than

other

hypothesis

in

G



37

Algorithm for searching in version space
-

cont


3
.
2

if

ex
-
,

n
,

then



3
.
2
.
1

Remove

from

S

all

the

hypothesis

that

cover

n



3
.
2
.
2

for

each

g



G

repeat



-

if

g

covers

n

then

replace

g

with

the

most

general

specialization

which

does

not

cover

n



-

Remove

from

G

all

hypthesis

more

specific

than

other

hypothesis

in

G



-

Remove

from

G

all

hypthesis

more

specific

than

other

hypothesis

in

S

4
.

if

G

=

S

and

card(
S
)

=

1

then

a

concept

is

found

5
.

if

G

=

S

=

{

}

then

there

is

no

concept

consistent

with

all

hypothesis

end

38

Algorithm for searching in version space

39

Negative: obj(large, red, cube)

Positive: obj(small, red, ball)

Negative: obj(small, blue, ball)

G: { obj(X, Y, Z) }

S: { }

G: { obj(X, Y, Z) }

S: { obj(small, red, ball) }

Positive: obj(large, red, ball)

G: { obj(X, red, ball) }

S: { obj(X, red, ball) }

G: { obj(X, red, Z) }

S: { obj(small, red, ball) }

G: { obj(X, red, Z) }

S: { obj(X, red, ball) }

Implementation of the algorithm specific to general



40

exemple([pos([large,white,ball]),neg([small,red,brick]),


pos([small,blue,ball]),neg([large,blue,cube])]).


acopera([],[]).

acopera([H1|T1], [H2|T2]) :
-

var(H1), var(H2), acopera(T1,T2).

acopera([H1|T1], [H2|T2]) :
-

var(H1), atom(H2), acopera(T1,T2).

acopera([H1|T1], [H2|T2]) :
-

atom(H1), atom(H2), H1=H2, acopera(T1,T2).


maigeneral(X,Y) :
-

not(acopera(Y,X)), acopera(X,Y).


generaliz([], [], []).

generaliz([Atrib|Rest], [Inst|RestInst], [Atrib|RestGen]):
-



Atrib==Inst, generaliz(Rest,RestInst,RestGen).

generaliz([Atrib |Rest], [Inst|RestInst], [_|RestGen]):
-



Atrib
\
=Inst, generaliz(Rest,RestInst,RestGen).





41

specgen

:
-

exemple( [pos(H)|Rest] ), speclagen([H], [], Rest).


speclagen(H, N, []) :
-

print('H='), print(H), nl,


print('N='), print(N), nl.

speclagen(H, N, [Ex|RestEx]) :
-

process(Ex, H, N, H1, N1),





speclagen(H1, N1, RestEx).


process(pos(Ex), H, N, H1, N) :
-



generalizset
(H, HGen, Ex),



elim(X, HGen, (member(Y,HGen), maigeneral(X,Y)), H2),



elim(X, H2, (member(Y,N),acopera(X,Y)), H1).

process(neg(Ex), H, N, H1, [Ex|N]) :
-



elim(X, H, acopera(X,Ex), H1).


elim(X,L,Goal,L1):
-

(bagof(X, (member(X,L), not(Goal)), L1);




L1=[]).



Implementation of the algorithm specific to general



42

generalizset
([], [], _).


generalizset
([Ipot|Rest], IpotNoua, Ex) :
-



not(acopera(Ipot,Ex)),



(bagof(X, generaliz(Ipot,Ex,X), ListIpot);




ListIpot=[]),



generalizset
(Rest,RestNou,Ex),



append(ListIpot,RestNou,IpotNoua).


generalizset
([Ipot|Rest], [Ipot|RestNou], Ex):
-



acopera(Ipot,Ex),



generalizset
(Rest,RestNou,Ex).


?
-

specgen.

H=[[_G390, _G393, ball]]

N=[[large, blue, cube], [small, red, brick]]

Implementation of the algorithm specific to general