# Algorithms of Artificial Intelligence

Τεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 4 χρόνια και 6 μήνες)

80 εμφανίσεις

1

Algorithms of Artificial Intelligence

Lecture 6: Learning

E. Tyugu

2

Content

Basic concepts

transfer function

classification

stages of usage

Perceptron

Hopfield net

Hamming net

Carpenter
-
Grossberg’s net

Kohonen’s feature maps

Bayesian networks

ID3

AQ

3

Neural nets

Neural

nets

provide

another

form

of

massively

parallel

learning

functionality
.

They

are

well

suited

for

learning

pattern

recognition
.

A

simple

way

to

describe

a

neural

net

is

to

represent

it

as

a

graph
.

Each

node

of

the

graph

has

an

associated

variable

called

state

and

a

constant

called

threshold
.

Each

arc

of

the

graph

has

an

associated

numeric

value

called

weight
.

Behaviour

of

a

neural

net

is

determined

by

transfer

functions

for

nodes

which

compute

new

values

of

states

from

previous

states

of

neighbouring

nodes
.

4

N
ode

of a net

A common transfer function is of the form

xj =

f(

wij*xi
-

tj)

where the sum is taken over incoming arcs with
weigths

wij,
and
xi
are
states of the neighboring nodes,
tj

is
threshold

of the node
j

where the
new state is computed. Learning in neural nets means changing the
weights in a right way.

w1j

wnj

x1

xn

xj

f

5

Transfer functions

hard limiter
sigmoid
threshold logic
-1
+1
+1
+1
x
x
x
f(x)
f(x)
f(x)

6

Forward
-
pass and layered nets

1
.

Forward
-
pass

neural

net

is

an

acyclic

graph
.

Its

nodes

can

be

classified

as

input,

output

and

internal

nodes
.

Input

nodes

do

not

have

neighbours

on

incoming

arcs,

output

nodes

do

not

have

them

on

outgoing

arcs

and

internal

nodes

possess

both

kinds

of

neighbours
.

2
.

Layered

(
n
-
layered)

net

is

a

forward
-
pass

net

where

each

path

from

an

input

node

to

an

output

node

contains

exactly

n

nodes
.

Each

node

in

such

a

graph

belongs

exactly

to

one

layer,

n
-
layered

net

is

strongly

connected
,

if

each

node

in

the

i
-
th

layer

is

connected

to

all

nodes

of

the

(i+
1
)

-
st

layer,

i=
1
,
2
,

.

.

.

,n
-
1
.

States

of

the

layered

net

can

be

interpreted

as

decisions

on

the

basis

of

the

states

of

the

input

nodes
.

7

L
ayered neural net

} input nodes
} output nodes
} intermediate nodes
. . . .
Learning in a layered net can be performed by means of
back
-
propagation
.

In this case, the states taken by output nodes are evaluated and credit or blame

is assigned to each output node. The evaluations are propagated back to other layers.

8

Stages of usage

1. Selection of the structure (of the network type)

2. Assignment of initial weights

3. Learning/teaching

4. Application

9

Perceptrons

single
-
layer

double
-
layer

three
-
layer

Perceptrons
´
nodes are hard limiters or sigmoids.

Examples:

10

Learning in a single
-
layer perceptron

1. Initialize weights
wi
and threshold

t
to small random values.

2. Take an input
x1, …, xn

and the desired output
d
.

3. Calculate output
x

of the perceptron.

4. Adapt the weights
:

wi
´
= wi + h*(d
-

x)*xi,

where
h<1

is a positive gain value,

+1
, if input is from one class

d =

-
1
, if input is from the other class

Repeat 2
-

4, if needed.

NB!
Note that t
he weights are changed only for incorrect output
d
.

11

Regions separable by perceptrons

single-layered
double-layered
three-layered
A
B
B
B
A
A

12

Hopfield net

x1
x2
xn
x1'
x2'
xn'
Every node is connected to all other nodes
, weights are symmetric

(
w
ij
= w
ji
)
.

Works with binary
(+1,
-
1)

input signals. The output is

also a tuple of values
+1
or

-
1
.

even a sigmoid

can be used

13

Hopfield net

1. Initialize connection weights:

wij

=

xis * xjs, i

j,

where

xis
is

+1 or
-
1
as in the description
x1s, …, xns

of the class

s.

2. Initialise states with an unknown pattern

x1, …, xn.

3. Iterate until convergence

(even can be done asynchronously)
:

xj
´
= f (

wij*xi),

where
f
is the hard limiter
.

Remarks
:

A Hopfield net

can be used either as a classifier or an associative memory.

It
converges always, but no match may occur.

It works well, in the case when number of classes is less than
0.15*n
.

There are several modifications of the Hopfield net architecture..

s

14

Hamming net

The Hamming net calculates Hamming distance to exemplar of each
class and shows
positive

output for the class with the
minimal

distance.

This net is widely used for restoring corrupted binary fixed length
signals.

Hamming net works faster than Hopfield net, has less connections for
larger number of input signals.

It implements the optimum minimum error classifier when bit errors
are random and independent.

15

Hamming net

x1
x2
xn
y1
y2
ym
calculate
Hamming
distance
select the
best match
z
1

z
m

z
2

16

Hamming net

Value at a middle node
z
s
is
n

hd
s

where
hd
s

is Hamming distance to
the exemplar pattern
p
s
.

Threfore in the lower subnet weight from input
x
i

to the middle node
z
s

is
w
is

= x
is
/2,

t
s

= n/2
for each exemplar

s
.

Indeed,
0

for the most incorrect code, and
1 = (+1

(
-
1))*
x
is
/2

is
added for each correct input signal, so that this gives
n
for correct
code.

17

Hamming net

continued

1. Initialize weights and offsets:

a) lower subnet:
wis = xis/2, tj = n/2
for each exemplar

s;

b) upper subnet:
tk=0
,
wsk =
if

k=s
then

1
else

-
e
, where

0 < e < 1/m
.

2. Initialize the lower subnet with (unknown) pattern
x1,…, xn

and calculate

yj = f(

wij*xi
-

tj).

3. Iterate in the upper subnet until convergence:

yj
´
= f(yj
-

e*

yk
).

18

A comparator subnet

Here is a comparator subnet that selects the maximum of two analog
inputs
x0, x1
. Combining several of these nets one builds comparators
for more inputs (4, 8 etc., approximately

log
2
n

layers for
n

inputs).
Output
z

is the maximum value,
y0

and
y1
indicate which input is
maximum, dark nodes are hard limiters, light nodes are threshold logic
nodes, all thresholds are
0
, weights are shown on arcs.

y0

y1

z

x0

x1

1

1

1

1

-
1

-
1

0.5

0.5

0.5

0.5

19

Carpenter
-
Grossberg net

This net forms clusters without supervision. Its clustering algorithm is
similar to the simple leader clustering algorithm:

select the first input as the exemplar for the first cluster;

if the next input is close enough to some cluster exemplar, it is

added to the cluster, otherwise it becomes the exemplar of a new cluster.

The net includes much feedback and is described by nonlinear differential
equations.

20

Carpenter
-
Grossberg net

Carpenter
-
Grossberg net for three binary inputs:
x0, x1, x2
and two classes
.

x0

x1

x2

21

Kohonen
´
s feature maps

A Kohonen’s

self organizing feayture map
(K
-
map)
is uses analogy
with such biological neural structures where the placement of neurons
is orderly and reflects structure of external (sensed) stimuli (e.g. in
auditory and visual pathways).

K
-
map learns, when continuous
-
valued input vectors are presented to
it without specifying the desired output. The weights of connections
can adjust to regularities in input. Large number of examples is
needed.

K
-
map

mimics well learning in biological neural structures.

It is usable in speech recognizer.

22

Kohonen
´
s feature maps

continued

This is a flat (two
-
dimensional) structure with connections between
neighbors and connections from each input node to all its output nodes
.

It learns clusters of input vectors without any help from teacher.
Preserves closeness (topolgy).

continues valued
input

vector

Output nodes

23

Learning in K
-
maps

1. Initialize weights to small random numbers and set initial radius of
neighborhood of nodes.

2. Get an input
x1, …, xn
.

3. Compute distance
dj

to each output node:

dj

=

(
xi

-

wij
)
2

4. Select output node
s
with minimal distance
ds.

5. Update weights for the node
s

and all nodes in its neighborhood:

wij
´
= wij + h* (xi
-

wij),

where
h<1

is a gain that decreases in time.

Repeat steps 2
-

5.

24

Bayesian networks

Bayesian networks use the c
onditional probability formula

P(e,H)=
P(H|e)P(e) = P(e|H)P(H)

binding the conditiona probabilities of evidence
e

and hypothesis
H.

Bayesian network

is a graph whose nodes are variables denoting
occurrence of events, arcs express causal dependence of events. Each
node
x

has conditional probabilities for every possible combination of
events influencing the node, i.e. for every collection of events in nodes
of
pred(x)
immediately preceding the node
x

in the graph.

25

Bayesian networks

Example:

x1

x2

x4

x6

x3

x5

The joint probability assessment for all nodes
x1,…,xn
:

P(x1,…,xn) = P(x1|pred(x1))*...*P(xn|pred(xn))

constitutes a joint
-
probability model that supports the assessed event

combination. For the present example it is
a
s follows:

P(x1,…,x6) = P(x6|x5)*P(x5|x2,x3)*P(x4|x1,x2)*P(x3|x1)*P(x2|x1)*P(x1)

26

Bayesian networks

continued

A bayesian network can be used for diagnosis/classification: given
some events, the probablities of events depending on the given ones
can be predicted.

To construct a bayesian network, one needs to

determine its structure (topology)

find conditional probabilities for each dependency.

27

Taxonomy of neural nets

NEURAL NETS
BINARY-VALUED INPUTS
CONTINUOUS INPUTS
UNSUPERVISED
LEARNING
SUPERVISED
LEARNING
UNSUPERVISED
LEARNING
SUPERVISED
LEARNING
KOHONEN
MAPS
MULTI-LAYERED
PERCEPTRONS
SINGLE-LAYERED
PERCEPTRONS
CARPENTER-
GROSSBERG
NETS
HAMMING
NETS
HOPFIELD
NETS

28

A decision tree

outlook
humidity
windy
overcast
sunny
rain
high normal
true false
+
_
+
_
+

29

ID3

algorithm

To

get

the

fastest

decision
-
making

procedure,

one

has

to

arrange

at
tr
ibutes

in

a

dec
i
sion

tree

in

a

proper

order

-

the

most

discriminating

att
r
ibutes

first
.

This

is

done

by

the

algorithm

called

ID
3
.

The

most

discriminating

attribute

can

be

defined

in

precise

terms

as

the

attribute

for

which

the

fixing

its

value

changes

the

enthropy

of

possible

decisions

at

most
.

Let

wj

be

the

frequency

of

the

j
-
th

decision

in

a

set

of

examples

x
.

Then

the

enthropy

of

the

set

is

E(x)=

-

wj*

l
o
g(wj)

Let

fix(x,a,v)

denote

the

set

of

these

elements

of

x

whose

value

of

attribute

a

is

v
.

The

average

enthropy

that

remains

in

x

,

after

the

value

a

has

been

fixed
,

is
:

H(x,a)

=

kv

E(fix(x,a,v)),

where

kv

is

the

ratio

of

examples

in

x

with

attribute

a

having

value

v
.

30

ID3

algorithm

ID
3

uses

the

following

variables

and

functions
:

p

--

pointer

to

the

root

of

the

decision

tree

being

built
;

x

--

set

of

examples
;

E(x)

--

enthropy

of

x

for

the

the

set

of

examples

x
;

H(x,a)

--

average

entropy

that

remains

in

x

after

the

value

of

a

has

been

fixed
;

atts(x)

--

attributes

of

the

set

of

examples

x
;

vals(a)

--

values

of

the

attribute

a
;

mark(p,d)

--

mark

node

p

with

d
;

newsucc(p,v)

--

new

successor

to

the

node

p
,

with

attribute

value

v,

returns

pointer

p

to

the

new

node
;

fix(x,a,v)

--

subset

of

given

set

of

examples

x

with

the

value

v

of

the

attribute

a
.

31

ID3 continued

A.3.10:
ID3(x,p)=
if

empty(x)
then
failure

elif

E(x)=0
then

mark(p,decision(x))

else
h:=bignumber;

for
a

atts(x)
do

if
H(x,a) < h
then

h:=H(x,a); am:=a
fi

od;

mark(p,am);

for

v

vals(am,x)
do

ID3(fix(x,am,v),newsucc(p,v))

od

fi

32

AQ

algorithm

This algorith is for learning knowledge in the form of rules.

The algorithm
AQ(ex,cl)

builds a set of rules from the given set of
examples
ex
for the collection of classes
cl

using the function
aqrules(p,n,c)

for building a set of rules for a class
c

from its given
positive examples
p
and negative examples

n
.

pos(ex,c)

is a set of positive examples for class
c
in

ex

neg(ex,c)

is a set of negative examples for class
c
in

ex

covers(r,e)
is a predicate which is true when example
e

satisfies the
rule
r.

prune(rules)
throws away rules covered by some other rule.

33

AQ continued

A.3.11:

AQ(ex,cl)=

allrules = { };

for
c

cl
do

allrules:=alrules

aqrules(pos(ex,c),neg(ex,c),c)

od;

return(allrules)

aqrules(pos,neg,c) =

rules := {aqrule(
selectFrom(pos)
,neg,c
)
};

for

e

pos
do

L
:
{
for
r

rules
do

if
covers(r,e)
then
break

L

fi

od;

rules:=rules

{aqrule(e,neg,c)}
;

prune(rules)

}

od;

return
(rules)

34

AQ continued

aqrule(seed,neg,c)

--

builds

a

new

rule

from

the

initial

condition

seed

and

negative

examples

neg

for

the

class

c
.

newtest
s
(r,seed,e)

--

generates

amendment
s

q

to

the

rule

r
,

r&q

covers

seed

and

not

e
;

worstelement
s
(star)

--

chooses

the

least

promising

elements

in

star
.

aqrule(seed,neg,c)

=

star:={true};

for

e

neg
do

for
r

star
do

if
covers(r,e)
then

star:
=
(
star

{r&q| q

newtest
s
(r,seed,e)}
)

\
{r}

fi;

while
size(star)>maxstar
do

star:=star
\
worstelem
ents
(star)

od

od

od;

return
("if" bestin(star) "then"c)

35

A clustering problem

(learning without a teacher)

36

Hierarchy of learning methods

specific to
general
Learning
massively parallel
learing
parametric
learning
by auto-
mata
numeri-
cally
neural
nets
genetic
algorithms
symbolic
learning
search in
concept space
inductive
inference
general
to specific
inverse
resolution

37

Otsustustabelid

Otsutustabel on teadmuse kompakte esitusviis, kui teadmus on valiku
tegemiseks lõpliku (ja mitte eriti suure) hulga võimaluste seast.

Otsustustabel on kolme tüüpi aladest koosnev liittabel.

Tingimuste list (C1, C2,…,Ck on tingimused, mis pannakse kirja
mingis formaalses

programmiks tõlgitavas keeles):

C1

C2

Ck

38

Otsustustabelid

Valikumaatriks,mis koosneb tingimustele vastavatest veergudest ja

Tabeli igas ruudus võib olla üks kolmest väärtusest:

y

jah, tingimus peab oleman täidetud

n

ei, tingimus ei tohi olla täidetud

0

pole oluline, kas tingimus on täidetud või ei (lahter on siis sageli lihtsalt
tühi).

C1 C2 …. Ck

39

Otsustustabelid

Tabeli kolmandas väljas on valitavad otsused. Kui esimest ja teist

tüüpi välju on kumbagi kaks, saab ka kolmanda välja teha maatriksi kujul:

y

n

n

y

y

y

y

n

n

y

y

y

n

Otsused

Tingimused

Tingimused

40

Bibliography

Kohonen,

T
.

(
1984
)

Self
-
Organization

and

Associative

Memory
.

Springer

Verlag,

Holland
.

Lippmann,

R
.

(
1987
)

An

Introduction

to

Computing

with

Neural

Nets
.

IEEE

ASSP

Magazine
,

No
.

4
,

4

-

2
2
.

Michalski,

S
.

(
1983
)

Theory

and

methodology

of

inductive

learning
.

In
:

S
.

Michalski,

J
.

Carbonell,

T
.

Mitchell,

eds
.

Machine

learning
:

an

Artificial

Intelligence

approach
.

Tioga,

Palo

Alto,

83

134
.

41

Exercises

Sample data for ID3
Outlook Temperature
Humidity Windy
Class
Sunny
Sunny
Overcast
Rain
Rain
Rain
Overcast
Sunny
Sunny
Rain
Sunny
Overcast
Overcast
Rain
Hot
Hot
Hot
Mild
Cool
Cool
Cool
Mild
Cool
Mild
Mild
Mild
Hot
Mild
High
High
High
High
Normal
Normal
Normal
High
Normal
Normal
Normal
High
Normal
High
False
True
False
False
False
True
True
False
False
False
True
True
False
True
-
-
+
+
+
-
+
-
+
+
+
+
+
-
1. Calculate the entropies of attributes; 2. build a decision tree