Algorithms of Artificial Intelligence

lettuceescargatoireAI and Robotics

Nov 7, 2013 (3 years and 7 months ago)

50 views

© Enn Tyugu

1

Algorithms of Artificial Intelligence

Lecture 6: Learning

E. Tyugu

© Enn Tyugu

2

Content


Basic concepts


transfer function


classification


stages of usage


Perceptron


Hopfield net


Hamming net


Carpenter
-
Grossberg’s net


Kohonen’s feature maps


Bayesian networks


ID3


AQ


© Enn Tyugu

3

Neural nets

Neural

nets

provide

another

form

of

massively

parallel

learning

functionality
.

They

are

well

suited

for

learning

pattern

recognition
.

A

simple

way

to

describe

a

neural

net

is

to

represent

it

as

a

graph
.

Each

node

of

the

graph

has

an

associated

variable

called

state

and

a

constant

called

threshold
.

Each

arc

of

the

graph

has

an

associated

numeric

value

called

weight
.

Behaviour

of

a

neural

net

is

determined

by

transfer

functions

for

nodes

which

compute

new

values

of

states

from

previous

states

of

neighbouring

nodes
.


© Enn Tyugu

4

N
ode

of a net


A common transfer function is of the form



xj =

f(


wij*xi
-

tj)


where the sum is taken over incoming arcs with
weigths

wij,
and
xi
are
states of the neighboring nodes,
tj

is
threshold

of the node
j

where the
new state is computed. Learning in neural nets means changing the
weights in a right way.


w1j


wnj

x1

xn

xj

f

© Enn Tyugu

5

Transfer functions

hard limiter
sigmoid
threshold logic
-1
+1
+1
+1
x
x
x
f(x)
f(x)
f(x)
© Enn Tyugu

6

Forward
-
pass and layered nets


1
.

Forward
-
pass

neural

net

is

an

acyclic

graph
.

Its

nodes

can

be

classified

as

input,

output

and

internal

nodes
.

Input

nodes

do

not

have

neighbours

on

incoming

arcs,

output

nodes

do

not

have

them

on

outgoing

arcs

and

internal

nodes

possess

both

kinds

of

neighbours
.




2
.

Layered

(
n
-
layered)

net

is

a

forward
-
pass

net

where

each

path

from

an

input

node

to

an

output

node

contains

exactly

n

nodes
.

Each

node

in

such

a

graph

belongs

exactly

to

one

layer,

n
-
layered

net

is

strongly

connected
,

if

each

node

in

the

i
-
th

layer

is

connected

to

all

nodes

of

the

(i+
1
)

-
st

layer,

i=
1
,
2
,

.

.

.

,n
-
1
.

States

of

the

layered

net

can

be

interpreted

as

decisions

made

on

the

basis

of

the

states

of

the

input

nodes
.


© Enn Tyugu

7

L
ayered neural net

} input nodes
} output nodes
} intermediate nodes
. . . .
Learning in a layered net can be performed by means of
back
-
propagation
.

In this case, the states taken by output nodes are evaluated and credit or blame

is assigned to each output node. The evaluations are propagated back to other layers.

© Enn Tyugu

8

Stages of usage


1. Selection of the structure (of the network type)

2. Assignment of initial weights

3. Learning/teaching

4. Application


© Enn Tyugu

9

Perceptrons

single
-
layer

double
-
layer

three
-
layer

Perceptrons
´
nodes are hard limiters or sigmoids.

Examples:

© Enn Tyugu

10

Learning in a single
-
layer perceptron

1. Initialize weights
wi
and threshold

t
to small random values.

2. Take an input
x1, …, xn

and the desired output
d
.

3. Calculate output
x

of the perceptron.

4. Adapt the weights
:

wi
´
= wi + h*(d
-

x)*xi,

where
h<1

is a positive gain value,



+1
, if input is from one class

d =



-
1
, if input is from the other class


Repeat 2
-

4, if needed.

NB!
Note that t
he weights are changed only for incorrect output
d
.

© Enn Tyugu

11

Regions separable by perceptrons

single-layered
double-layered
three-layered
A
B
B
B
A
A
© Enn Tyugu

12

Hopfield net

x1
x2
xn
x1'
x2'
xn'
Every node is connected to all other nodes
, weights are symmetric

(
w
ij
= w
ji
)
.

Works with binary
(+1,
-
1)

input signals. The output is

also a tuple of values
+1
or

-
1
.

even a sigmoid

can be used

© Enn Tyugu

13

Hopfield net

1. Initialize connection weights:




wij

=


xis * xjs, i


j,

where

xis
is

+1 or
-
1
as in the description
x1s, …, xns

of the class

s.

2. Initialise states with an unknown pattern

x1, …, xn.

3. Iterate until convergence

(even can be done asynchronously)
:



xj
´
= f (


wij*xi),


where
f
is the hard limiter
.


Remarks
:



A Hopfield net

can be used either as a classifier or an associative memory.


It
converges always, but no match may occur.


It works well, in the case when number of classes is less than
0.15*n
.


There are several modifications of the Hopfield net architecture..

s

© Enn Tyugu

14

Hamming net


The Hamming net calculates Hamming distance to exemplar of each
class and shows
positive

output for the class with the
minimal

distance.



This net is widely used for restoring corrupted binary fixed length
signals.



Hamming net works faster than Hopfield net, has less connections for
larger number of input signals.



It implements the optimum minimum error classifier when bit errors
are random and independent.

© Enn Tyugu

15

Hamming net

x1
x2
xn
y1
y2
ym
calculate
Hamming
distance
select the
best match
z
1

z
m

z
2

© Enn Tyugu

16

Hamming net



Value at a middle node
z
s
is
n


hd
s

where
hd
s

is Hamming distance to
the exemplar pattern
p
s
.




Threfore in the lower subnet weight from input
x
i

to the middle node
z
s

is
w
is

= x
is
/2,

t
s

= n/2
for each exemplar

s
.



Indeed,
0

for the most incorrect code, and
1 = (+1

(
-
1))*
x
is
/2

is
added for each correct input signal, so that this gives
n
for correct
code.

© Enn Tyugu

17

Hamming net

continued


1. Initialize weights and offsets:

a) lower subnet:
wis = xis/2, tj = n/2
for each exemplar

s;

b) upper subnet:
tk=0
,
wsk =
if

k=s
then

1
else

-
e
, where

0 < e < 1/m
.


2. Initialize the lower subnet with (unknown) pattern
x1,…, xn

and calculate

yj = f(


wij*xi
-

tj).


3. Iterate in the upper subnet until convergence:

yj
´
= f(yj
-

e*


yk
).


© Enn Tyugu

18

A comparator subnet


Here is a comparator subnet that selects the maximum of two analog
inputs
x0, x1
. Combining several of these nets one builds comparators
for more inputs (4, 8 etc., approximately

log
2
n

layers for
n

inputs).
Output
z

is the maximum value,
y0

and
y1
indicate which input is
maximum, dark nodes are hard limiters, light nodes are threshold logic
nodes, all thresholds are
0
, weights are shown on arcs.

y0

y1

z

x0

x1

1

1

1

1

-
1

-
1

0.5

0.5

0.5

0.5

© Enn Tyugu

19

Carpenter
-
Grossberg net



This net forms clusters without supervision. Its clustering algorithm is
similar to the simple leader clustering algorithm:



select the first input as the exemplar for the first cluster;


if the next input is close enough to some cluster exemplar, it is


added to the cluster, otherwise it becomes the exemplar of a new cluster.



The net includes much feedback and is described by nonlinear differential
equations.


© Enn Tyugu

20

Carpenter
-
Grossberg net



Carpenter
-
Grossberg net for three binary inputs:
x0, x1, x2
and two classes
.

x0

x1

x2

© Enn Tyugu

21

Kohonen
´
s feature maps


A Kohonen’s

self organizing feayture map
(K
-
map)
is uses analogy
with such biological neural structures where the placement of neurons
is orderly and reflects structure of external (sensed) stimuli (e.g. in
auditory and visual pathways).



K
-
map learns, when continuous
-
valued input vectors are presented to
it without specifying the desired output. The weights of connections
can adjust to regularities in input. Large number of examples is
needed.




K
-
map

mimics well learning in biological neural structures.



It is usable in speech recognizer.

© Enn Tyugu

22

Kohonen
´
s feature maps

continued


This is a flat (two
-
dimensional) structure with connections between
neighbors and connections from each input node to all its output nodes
.


It learns clusters of input vectors without any help from teacher.
Preserves closeness (topolgy).

continues valued
input

vector

Output nodes

© Enn Tyugu

23

Learning in K
-
maps

1. Initialize weights to small random numbers and set initial radius of
neighborhood of nodes.

2. Get an input
x1, …, xn
.

3. Compute distance
dj

to each output node:


dj

=


(
xi

-

wij
)
2

4. Select output node
s
with minimal distance
ds.


5. Update weights for the node
s

and all nodes in its neighborhood:

wij
´
= wij + h* (xi
-

wij),


where
h<1

is a gain that decreases in time.

Repeat steps 2
-

5.

© Enn Tyugu

24

Bayesian networks


Bayesian networks use the c
onditional probability formula



P(e,H)=
P(H|e)P(e) = P(e|H)P(H)


binding the conditiona probabilities of evidence
e

and hypothesis
H.



Bayesian network

is a graph whose nodes are variables denoting
occurrence of events, arcs express causal dependence of events. Each
node
x

has conditional probabilities for every possible combination of
events influencing the node, i.e. for every collection of events in nodes
of
pred(x)
immediately preceding the node
x

in the graph.

© Enn Tyugu

25

Bayesian networks

Example:

x1

x2

x4

x6

x3

x5

The joint probability assessment for all nodes
x1,…,xn
:


P(x1,…,xn) = P(x1|pred(x1))*...*P(xn|pred(xn))


constitutes a joint
-
probability model that supports the assessed event

combination. For the present example it is
a
s follows:


P(x1,…,x6) = P(x6|x5)*P(x5|x2,x3)*P(x4|x1,x2)*P(x3|x1)*P(x2|x1)*P(x1)

© Enn Tyugu

26

Bayesian networks

continued


A bayesian network can be used for diagnosis/classification: given
some events, the probablities of events depending on the given ones
can be predicted.



To construct a bayesian network, one needs to


determine its structure (topology)


find conditional probabilities for each dependency.






© Enn Tyugu

27

Taxonomy of neural nets

NEURAL NETS
BINARY-VALUED INPUTS
CONTINUOUS INPUTS
UNSUPERVISED
LEARNING
SUPERVISED
LEARNING
UNSUPERVISED
LEARNING
SUPERVISED
LEARNING
KOHONEN
MAPS
MULTI-LAYERED
PERCEPTRONS
SINGLE-LAYERED
PERCEPTRONS
CARPENTER-
GROSSBERG
NETS
HAMMING
NETS
HOPFIELD
NETS
© Enn Tyugu

28

A decision tree

outlook
humidity
windy
overcast
sunny
rain
high normal
true false
+
_
+
_
+
© Enn Tyugu

29

ID3

algorithm


To

get

the

fastest

decision
-
making

procedure,

one

has

to

arrange

at
tr
ibutes

in

a

dec
i
sion

tree

in

a

proper

order

-

the

most

discriminating

att
r
ibutes

first
.

This

is

done

by

the

algorithm

called

ID
3
.



The

most

discriminating

attribute

can

be

defined

in

precise

terms

as

the

attribute

for

which

the

fixing

its

value

changes

the

enthropy

of

possible

decisions

at

most
.

Let

wj

be

the

frequency

of

the

j
-
th

decision

in

a

set

of

examples

x
.

Then

the

enthropy

of

the

set

is


E(x)=

-


wj*

l
o
g(wj)


Let

fix(x,a,v)

denote

the

set

of

these

elements

of

x

whose

value

of

attribute

a

is

v
.

The

average

enthropy

that

remains

in

x

,

after

the

value

a

has

been

fixed
,


is
:


H(x,a)

=



kv

E(fix(x,a,v)),


where

kv

is

the

ratio

of

examples

in

x

with

attribute

a

having

value

v
.


© Enn Tyugu

30

ID3

algorithm


ID
3

uses

the

following

variables

and

functions
:


p

--

pointer

to

the

root

of

the

decision

tree

being

built
;

x

--

set

of

examples
;

E(x)

--

enthropy

of

x

for

the

the

set

of

examples

x
;

H(x,a)

--

average

entropy

that

remains

in

x

after

the

value

of

a


has

been

fixed
;

atts(x)

--

attributes

of

the

set

of

examples

x
;

vals(a)

--

values

of

the

attribute

a
;

mark(p,d)

--

mark

node

p

with

d
;

newsucc(p,v)

--

new

successor

to

the

node

p
,

with

attribute

value

v,

returns

pointer

p

to

the

new

node
;

fix(x,a,v)

--

subset

of

given

set

of

examples

x

with

the

value

v

of

the

attribute

a
.



© Enn Tyugu

31

ID3 continued

A.3.10:
ID3(x,p)=
if

empty(x)
then
failure




elif

E(x)=0
then

mark(p,decision(x))




else
h:=bignumber;





for
a


atts(x)
do






if
H(x,a) < h
then

h:=H(x,a); am:=a
fi





od;





mark(p,am);





for

v

vals(am,x)
do






ID3(fix(x,am,v),newsucc(p,v))





od





fi


© Enn Tyugu

32

AQ

algorithm



This algorith is for learning knowledge in the form of rules.


The algorithm
AQ(ex,cl)

builds a set of rules from the given set of
examples
ex
for the collection of classes
cl

using the function
aqrules(p,n,c)

for building a set of rules for a class
c

from its given
positive examples
p
and negative examples

n
.



pos(ex,c)

is a set of positive examples for class
c
in

ex



neg(ex,c)

is a set of negative examples for class
c
in

ex



covers(r,e)
is a predicate which is true when example
e

satisfies the
rule
r.



prune(rules)
throws away rules covered by some other rule.



© Enn Tyugu

33

AQ continued


A.3.11:

AQ(ex,cl)=




allrules = { };




for
c


cl
do




allrules:=alrules


aqrules(pos(ex,c),neg(ex,c),c)




od;




return(allrules)



aqrules(pos,neg,c) =



rules := {aqrule(
selectFrom(pos)
,neg,c
)
};



for

e


pos
do




L
:
{
for
r


rules
do





if
covers(r,e)
then
break

L

fi





od;





rules:=rules


{aqrule(e,neg,c)}
;





prune(rules)





}



od;



return
(rules)


© Enn Tyugu

34

AQ continued

aqrule(seed,neg,c)

--

builds

a

new

rule

from

the

initial

condition

seed

and

negative


examples

neg

for

the

class

c
.

newtest
s
(r,seed,e)

--

generates

amendment
s

q

to

the

rule

r
,

r&q

covers

seed

and

not

e
;

worstelement
s
(star)

--

chooses

the

least

promising

elements

in

star
.


aqrule(seed,neg,c)

=



star:={true};



for

e


neg
do




for
r


star
do





if
covers(r,e)
then







star:
=
(
star


{r&q| q


newtest
s
(r,seed,e)}
)

\
{r}





fi;





while
size(star)>maxstar
do






star:=star
\
worstelem
ents
(star)





od




od



od;



return
("if" bestin(star) "then"c)



© Enn Tyugu

35

A clustering problem

(learning without a teacher)

© Enn Tyugu

36

Hierarchy of learning methods


specific to
general
Learning
massively parallel
learing
parametric
learning
by auto-
mata
numeri-
cally
neural
nets
genetic
algorithms
symbolic
learning
search in
concept space
inductive
inference
general
to specific
inverse
resolution
© Enn Tyugu

37

Otsustustabelid


Otsutustabel on teadmuse kompakte esitusviis, kui teadmus on valiku
tegemiseks lõpliku (ja mitte eriti suure) hulga võimaluste seast.



Otsustustabel on kolme tüüpi aladest koosnev liittabel.


Tingimuste list (C1, C2,…,Ck on tingimused, mis pannakse kirja
mingis formaalses


programmiks tõlgitavas keeles):




C1

C2



Ck

© Enn Tyugu

38

Otsustustabelid


Valikumaatriks,mis koosneb tingimustele vastavatest veergudest ja

valikuvariantidele vastavatest ridadest.

Tabeli igas ruudus võib olla üks kolmest väärtusest:

y


jah, tingimus peab oleman täidetud

n


ei, tingimus ei tohi olla täidetud

0


pole oluline, kas tingimus on täidetud või ei (lahter on siis sageli lihtsalt
tühi).



C1 C2 …. Ck



© Enn Tyugu

39

Otsustustabelid



Tabeli kolmandas väljas on valitavad otsused. Kui esimest ja teist

tüüpi välju on kumbagi kaks, saab ka kolmanda välja teha maatriksi kujul:

y

n

n

y

y

y

y

n

n

y

y

y

n

Otsused

Tingimused

Tingimused

© Enn Tyugu

40

Bibliography


Kohonen,

T
.

(
1984
)

Self
-
Organization

and

Associative

Memory
.

Springer

Verlag,

Holland
.


Lippmann,

R
.

(
1987
)

An

Introduction

to

Computing

with

Neural

Nets
.

IEEE

ASSP

Magazine
,

No
.

4
,

4

-

2
2
.


Michalski,

S
.

(
1983
)

Theory

and

methodology

of

inductive

learning
.

In
:

S
.

Michalski,

J
.

Carbonell,

T
.

Mitchell,

eds
.

Machine

learning
:

an

Artificial

Intelligence

approach
.

Tioga,

Palo

Alto,

83


134
.






© Enn Tyugu

41

Exercises

Sample data for ID3
Outlook Temperature
Humidity Windy
Class
Sunny
Sunny
Overcast
Rain
Rain
Rain
Overcast
Sunny
Sunny
Rain
Sunny
Overcast
Overcast
Rain
Hot
Hot
Hot
Mild
Cool
Cool
Cool
Mild
Cool
Mild
Mild
Mild
Hot
Mild
High
High
High
High
Normal
Normal
Normal
High
Normal
Normal
Normal
High
Normal
High
False
True
False
False
False
True
True
False
False
False
True
True
False
True
-
-
+
+
+
-
+
-
+
+
+
+
+
-
1. Calculate the entropies of attributes; 2. build a decision tree