© Enn Tyugu
1
Algorithms of Artificial Intelligence
Lecture 6: Learning
E. Tyugu
© Enn Tyugu
2
Content
•
Basic concepts
–
transfer function
–
classification
–
stages of usage
•
Perceptron
•
Hopfield net
•
Hamming net
•
Carpenter

Grossberg’s net
•
Kohonen’s feature maps
•
Bayesian networks
•
ID3
•
AQ
© Enn Tyugu
3
Neural nets
Neural
nets
provide
another
form
of
massively
parallel
learning
functionality
.
They
are
well
suited
for
learning
pattern
recognition
.
A
simple
way
to
describe
a
neural
net
is
to
represent
it
as
a
graph
.
Each
node
of
the
graph
has
an
associated
variable
called
state
and
a
constant
called
threshold
.
Each
arc
of
the
graph
has
an
associated
numeric
value
called
weight
.
Behaviour
of
a
neural
net
is
determined
by
transfer
functions
for
nodes
which
compute
new
values
of
states
from
previous
states
of
neighbouring
nodes
.
© Enn Tyugu
4
N
ode
of a net
A common transfer function is of the form
xj =
f(
wij*xi

tj)
where the sum is taken over incoming arcs with
weigths
wij,
and
xi
are
states of the neighboring nodes,
tj
is
threshold
of the node
j
where the
new state is computed. Learning in neural nets means changing the
weights in a right way.
w1j
wnj
x1
xn
xj
f
© Enn Tyugu
5
Transfer functions
hard limiter
sigmoid
threshold logic
1
+1
+1
+1
x
x
x
f(x)
f(x)
f(x)
© Enn Tyugu
6
Forward

pass and layered nets
1
.
Forward

pass
neural
net
is
an
acyclic
graph
.
Its
nodes
can
be
classified
as
input,
output
and
internal
nodes
.
Input
nodes
do
not
have
neighbours
on
incoming
arcs,
output
nodes
do
not
have
them
on
outgoing
arcs
and
internal
nodes
possess
both
kinds
of
neighbours
.
2
.
Layered
(
n

layered)
net
is
a
forward

pass
net
where
each
path
from
an
input
node
to
an
output
node
contains
exactly
n
nodes
.
Each
node
in
such
a
graph
belongs
exactly
to
one
layer,
n

layered
net
is
strongly
connected
,
if
each
node
in
the
i

th
layer
is
connected
to
all
nodes
of
the
(i+
1
)

st
layer,
i=
1
,
2
,
.
.
.
,n

1
.
States
of
the
layered
net
can
be
interpreted
as
decisions
made
on
the
basis
of
the
states
of
the
input
nodes
.
© Enn Tyugu
7
L
ayered neural net
} input nodes
} output nodes
} intermediate nodes
. . . .
Learning in a layered net can be performed by means of
back

propagation
.
In this case, the states taken by output nodes are evaluated and credit or blame
is assigned to each output node. The evaluations are propagated back to other layers.
© Enn Tyugu
8
Stages of usage
1. Selection of the structure (of the network type)
2. Assignment of initial weights
3. Learning/teaching
4. Application
© Enn Tyugu
9
Perceptrons
single

layer
double

layer
three

layer
Perceptrons
´
nodes are hard limiters or sigmoids.
Examples:
© Enn Tyugu
10
Learning in a single

layer perceptron
1. Initialize weights
wi
and threshold
t
to small random values.
2. Take an input
x1, …, xn
and the desired output
d
.
3. Calculate output
x
of the perceptron.
4. Adapt the weights
:
wi
´
= wi + h*(d

x)*xi,
where
h<1
is a positive gain value,
+1
, if input is from one class
d =

1
, if input is from the other class
Repeat 2

4, if needed.
NB!
Note that t
he weights are changed only for incorrect output
d
.
© Enn Tyugu
11
Regions separable by perceptrons
singlelayered
doublelayered
threelayered
A
B
B
B
A
A
© Enn Tyugu
12
Hopfield net
x1
x2
xn
x1'
x2'
xn'
Every node is connected to all other nodes
, weights are symmetric
(
w
ij
= w
ji
)
.
Works with binary
(+1,

1)
input signals. The output is
also a tuple of values
+1
or

1
.
even a sigmoid
can be used
© Enn Tyugu
13
Hopfield net
1. Initialize connection weights:
wij
=
xis * xjs, i
j,
where
xis
is
+1 or

1
as in the description
x1s, …, xns
of the class
s.
2. Initialise states with an unknown pattern
x1, …, xn.
3. Iterate until convergence
(even can be done asynchronously)
:
xj
´
= f (
wij*xi),
where
f
is the hard limiter
.
Remarks
:
•
A Hopfield net
can be used either as a classifier or an associative memory.
•
It
converges always, but no match may occur.
•
It works well, in the case when number of classes is less than
0.15*n
.
•
There are several modifications of the Hopfield net architecture..
s
© Enn Tyugu
14
Hamming net
The Hamming net calculates Hamming distance to exemplar of each
class and shows
positive
output for the class with the
minimal
distance.
This net is widely used for restoring corrupted binary fixed length
signals.
Hamming net works faster than Hopfield net, has less connections for
larger number of input signals.
It implements the optimum minimum error classifier when bit errors
are random and independent.
© Enn Tyugu
15
Hamming net
x1
x2
xn
y1
y2
ym
calculate
Hamming
distance
select the
best match
z
1
z
m
z
2
© Enn Tyugu
16
Hamming net
Value at a middle node
z
s
is
n
–
hd
s
where
hd
s
is Hamming distance to
the exemplar pattern
p
s
.
Threfore in the lower subnet weight from input
x
i
to the middle node
z
s
is
w
is
= x
is
/2,
t
s
= n/2
for each exemplar
s
.
Indeed,
0
for the most incorrect code, and
1 = (+1
–
(

1))*
x
is
/2
is
added for each correct input signal, so that this gives
n
for correct
code.
© Enn Tyugu
17
Hamming net
continued
1. Initialize weights and offsets:
a) lower subnet:
wis = xis/2, tj = n/2
for each exemplar
s;
b) upper subnet:
tk=0
,
wsk =
if
k=s
then
1
else

e
, where
0 < e < 1/m
.
2. Initialize the lower subnet with (unknown) pattern
x1,…, xn
and calculate
yj = f(
wij*xi

tj).
3. Iterate in the upper subnet until convergence:
yj
´
= f(yj

e*
yk
).
© Enn Tyugu
18
A comparator subnet
Here is a comparator subnet that selects the maximum of two analog
inputs
x0, x1
. Combining several of these nets one builds comparators
for more inputs (4, 8 etc., approximately
log
2
n
layers for
n
inputs).
Output
z
is the maximum value,
y0
and
y1
indicate which input is
maximum, dark nodes are hard limiters, light nodes are threshold logic
nodes, all thresholds are
0
, weights are shown on arcs.
y0
y1
z
x0
x1
1
1
1
1

1

1
0.5
0.5
0.5
0.5
© Enn Tyugu
19
Carpenter

Grossberg net
This net forms clusters without supervision. Its clustering algorithm is
similar to the simple leader clustering algorithm:
select the first input as the exemplar for the first cluster;
if the next input is close enough to some cluster exemplar, it is
added to the cluster, otherwise it becomes the exemplar of a new cluster.
The net includes much feedback and is described by nonlinear differential
equations.
© Enn Tyugu
20
Carpenter

Grossberg net
Carpenter

Grossberg net for three binary inputs:
x0, x1, x2
and two classes
.
x0
x1
x2
© Enn Tyugu
21
Kohonen
´
s feature maps
A Kohonen’s
self organizing feayture map
(K

map)
is uses analogy
with such biological neural structures where the placement of neurons
is orderly and reflects structure of external (sensed) stimuli (e.g. in
auditory and visual pathways).
K

map learns, when continuous

valued input vectors are presented to
it without specifying the desired output. The weights of connections
can adjust to regularities in input. Large number of examples is
needed.
K

map
mimics well learning in biological neural structures.
It is usable in speech recognizer.
© Enn Tyugu
22
Kohonen
´
s feature maps
continued
This is a flat (two

dimensional) structure with connections between
neighbors and connections from each input node to all its output nodes
.
It learns clusters of input vectors without any help from teacher.
Preserves closeness (topolgy).
continues valued
input
vector
Output nodes
© Enn Tyugu
23
Learning in K

maps
1. Initialize weights to small random numbers and set initial radius of
neighborhood of nodes.
2. Get an input
x1, …, xn
.
3. Compute distance
dj
to each output node:
dj
=
(
xi

wij
)
2
4. Select output node
s
with minimal distance
ds.
5. Update weights for the node
s
and all nodes in its neighborhood:
wij
´
= wij + h* (xi

wij),
where
h<1
is a gain that decreases in time.
Repeat steps 2

5.
© Enn Tyugu
24
Bayesian networks
Bayesian networks use the c
onditional probability formula
P(e,H)=
P(He)P(e) = P(eH)P(H)
binding the conditiona probabilities of evidence
e
and hypothesis
H.
Bayesian network
is a graph whose nodes are variables denoting
occurrence of events, arcs express causal dependence of events. Each
node
x
has conditional probabilities for every possible combination of
events influencing the node, i.e. for every collection of events in nodes
of
pred(x)
immediately preceding the node
x
in the graph.
© Enn Tyugu
25
Bayesian networks
Example:
x1
x2
x4
x6
x3
x5
The joint probability assessment for all nodes
x1,…,xn
:
P(x1,…,xn) = P(x1pred(x1))*...*P(xnpred(xn))
constitutes a joint

probability model that supports the assessed event
combination. For the present example it is
a
s follows:
P(x1,…,x6) = P(x6x5)*P(x5x2,x3)*P(x4x1,x2)*P(x3x1)*P(x2x1)*P(x1)
© Enn Tyugu
26
Bayesian networks
continued
A bayesian network can be used for diagnosis/classification: given
some events, the probablities of events depending on the given ones
can be predicted.
To construct a bayesian network, one needs to
•
determine its structure (topology)
•
find conditional probabilities for each dependency.
© Enn Tyugu
27
Taxonomy of neural nets
NEURAL NETS
BINARYVALUED INPUTS
CONTINUOUS INPUTS
UNSUPERVISED
LEARNING
SUPERVISED
LEARNING
UNSUPERVISED
LEARNING
SUPERVISED
LEARNING
KOHONEN
MAPS
MULTILAYERED
PERCEPTRONS
SINGLELAYERED
PERCEPTRONS
CARPENTER
GROSSBERG
NETS
HAMMING
NETS
HOPFIELD
NETS
© Enn Tyugu
28
A decision tree
outlook
humidity
windy
overcast
sunny
rain
high normal
true false
+
_
+
_
+
© Enn Tyugu
29
ID3
algorithm
•
To
get
the
fastest
decision

making
procedure,
one
has
to
arrange
at
tr
ibutes
in
a
dec
i
sion
tree
in
a
proper
order

the
most
discriminating
att
r
ibutes
first
.
This
is
done
by
the
algorithm
called
ID
3
.
•
The
most
discriminating
attribute
can
be
defined
in
precise
terms
as
the
attribute
for
which
the
fixing
its
value
changes
the
enthropy
of
possible
decisions
at
most
.
Let
wj
be
the
frequency
of
the
j

th
decision
in
a
set
of
examples
x
.
Then
the
enthropy
of
the
set
is
E(x)=

wj*
l
o
g(wj)
•
Let
fix(x,a,v)
denote
the
set
of
these
elements
of
x
whose
value
of
attribute
a
is
v
.
The
average
enthropy
that
remains
in
x
,
after
the
value
a
has
been
fixed
,
is
:
H(x,a)
=
kv
E(fix(x,a,v)),
where
kv
is
the
ratio
of
examples
in
x
with
attribute
a
having
value
v
.
© Enn Tyugu
30
ID3
algorithm
ID
3
uses
the
following
variables
and
functions
:
p

pointer
to
the
root
of
the
decision
tree
being
built
;
x

set
of
examples
;
E(x)

enthropy
of
x
for
the
the
set
of
examples
x
;
H(x,a)

average
entropy
that
remains
in
x
after
the
value
of
a
has
been
fixed
;
atts(x)

attributes
of
the
set
of
examples
x
;
vals(a)

values
of
the
attribute
a
;
mark(p,d)

mark
node
p
with
d
;
newsucc(p,v)

new
successor
to
the
node
p
,
with
attribute
value
v,
returns
pointer
p
to
the
new
node
;
fix(x,a,v)

subset
of
given
set
of
examples
x
with
the
value
v
of
the
attribute
a
.
© Enn Tyugu
31
ID3 continued
A.3.10:
ID3(x,p)=
if
empty(x)
then
failure
elif
E(x)=0
then
mark(p,decision(x))
else
h:=bignumber;
for
a
atts(x)
do
if
H(x,a) < h
then
h:=H(x,a); am:=a
fi
od;
mark(p,am);
for
v
vals(am,x)
do
ID3(fix(x,am,v),newsucc(p,v))
od
fi
© Enn Tyugu
32
AQ
algorithm
This algorith is for learning knowledge in the form of rules.
The algorithm
AQ(ex,cl)
builds a set of rules from the given set of
examples
ex
for the collection of classes
cl
using the function
aqrules(p,n,c)
for building a set of rules for a class
c
from its given
positive examples
p
and negative examples
n
.
pos(ex,c)
is a set of positive examples for class
c
in
ex
neg(ex,c)
is a set of negative examples for class
c
in
ex
covers(r,e)
is a predicate which is true when example
e
satisfies the
rule
r.
prune(rules)
throws away rules covered by some other rule.
© Enn Tyugu
33
AQ continued
A.3.11:
AQ(ex,cl)=
allrules = { };
for
c
cl
do
allrules:=alrules
aqrules(pos(ex,c),neg(ex,c),c)
od;
return(allrules)
aqrules(pos,neg,c) =
rules := {aqrule(
selectFrom(pos)
,neg,c
)
};
for
e
pos
do
L
:
{
for
r
rules
do
if
covers(r,e)
then
break
L
fi
od;
rules:=rules
{aqrule(e,neg,c)}
;
prune(rules)
}
od;
return
(rules)
© Enn Tyugu
34
AQ continued
aqrule(seed,neg,c)

builds
a
new
rule
from
the
initial
condition
seed
and
negative
examples
neg
for
the
class
c
.
newtest
s
(r,seed,e)

generates
amendment
s
q
to
the
rule
r
,
r&q
covers
seed
and
not
e
;
worstelement
s
(star)

chooses
the
least
promising
elements
in
star
.
aqrule(seed,neg,c)
=
star:={true};
for
e
neg
do
for
r
star
do
if
covers(r,e)
then
star:
=
(
star
{r&q q
newtest
s
(r,seed,e)}
)
\
{r}
fi;
while
size(star)>maxstar
do
star:=star
\
worstelem
ents
(star)
od
od
od;
return
("if" bestin(star) "then"c)
© Enn Tyugu
35
A clustering problem
(learning without a teacher)
© Enn Tyugu
36
Hierarchy of learning methods
specific to
general
Learning
massively parallel
learing
parametric
learning
by auto
mata
numeri
cally
neural
nets
genetic
algorithms
symbolic
learning
search in
concept space
inductive
inference
general
to specific
inverse
resolution
© Enn Tyugu
37
Otsustustabelid
Otsutustabel on teadmuse kompakte esitusviis, kui teadmus on valiku
tegemiseks lõpliku (ja mitte eriti suure) hulga võimaluste seast.
Otsustustabel on kolme tüüpi aladest koosnev liittabel.
Tingimuste list (C1, C2,…,Ck on tingimused, mis pannakse kirja
mingis formaalses
–
programmiks tõlgitavas keeles):
C1
C2
…
Ck
© Enn Tyugu
38
Otsustustabelid
Valikumaatriks,mis koosneb tingimustele vastavatest veergudest ja
valikuvariantidele vastavatest ridadest.
Tabeli igas ruudus võib olla üks kolmest väärtusest:
y
–
jah, tingimus peab oleman täidetud
n
–
ei, tingimus ei tohi olla täidetud
0
–
pole oluline, kas tingimus on täidetud või ei (lahter on siis sageli lihtsalt
tühi).
C1 C2 …. Ck
© Enn Tyugu
39
Otsustustabelid
Tabeli kolmandas väljas on valitavad otsused. Kui esimest ja teist
tüüpi välju on kumbagi kaks, saab ka kolmanda välja teha maatriksi kujul:
y
n
n
y
y
y
y
n
n
y
y
y
n
Otsused
Tingimused
Tingimused
© Enn Tyugu
40
Bibliography
•
Kohonen,
T
.
(
1984
)
Self

Organization
and
Associative
Memory
.
Springer
Verlag,
Holland
.
•
Lippmann,
R
.
(
1987
)
An
Introduction
to
Computing
with
Neural
Nets
.
IEEE
ASSP
Magazine
,
No
.
4
,
4

2
2
.
•
Michalski,
S
.
(
1983
)
Theory
and
methodology
of
inductive
learning
.
In
:
S
.
Michalski,
J
.
Carbonell,
T
.
Mitchell,
eds
.
Machine
learning
:
an
Artificial
Intelligence
approach
.
Tioga,
Palo
Alto,
83
–
134
.
© Enn Tyugu
41
Exercises
Sample data for ID3
Outlook Temperature
Humidity Windy
Class
Sunny
Sunny
Overcast
Rain
Rain
Rain
Overcast
Sunny
Sunny
Rain
Sunny
Overcast
Overcast
Rain
Hot
Hot
Hot
Mild
Cool
Cool
Cool
Mild
Cool
Mild
Mild
Mild
Hot
Mild
High
High
High
High
Normal
Normal
Normal
High
Normal
Normal
Normal
High
Normal
High
False
True
False
False
False
True
True
False
False
False
True
True
False
True


+
+
+

+

+
+
+
+
+

1. Calculate the entropies of attributes; 2. build a decision tree
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment