DATA

MINING
Artificial Neural Networks
Alexey Minin, Jass 2006
ANN forms it’s output itself
,
according to the
information, presented for input
.
We have to minimize
some functional
.
After we have found this functional we
have to minimize it. It
is the main task, and according to
this functional the input vector will be changed
.
In practice
,
adaptive networks
code input information
in the most
compact
way, of course according to some predefined requirements.
Teaching without the tutor
:
introduction
Reducing the dimension of data
with min loss
Teaching without the tutor
:
redundancy of
data
The length of data
description
:
D d b
Dimension of data
=
number of components of input
vector
d
b
Capacity of data
=
number of bits
,
defining the possible variety
of
all values
x
Two ways of coding
(
reducing) the information
Reducing the variety
of data by detecting
the prototypes
finding of
independent
features
Clustering and quantifying
Two ways to reduce the data
x
Reducing the dimension
allows us to describe the data
with less components
Clustering allows us to reduce the
variety of data
,
reducing the
number of bits
,
we need to
describe the data
.
We can unite both types of algorithms
.
We can use Kohonen maps
,
when prototypes
regulate in the space of low dimension
.
For example
,
input data
can be
reflected
on to 2

dimensional
grid of prototypes the
way
,
you can visualize the data you have
.
NB
x
Main idea: neuron

indicator
x
y
w
x
j
j
d
w
x
1
Neuron
has
one
output
and
it’s
teaching
upon
a
d

dimension
data
Lets say that the activation function is linear
.
The output therefore is the
linear combination of it’s outputs
:
x
1
x
d
y
w
x
j
j
j
d
1
The
amplitude
after
the
training
is
finished
can
be
the
indicator
for
the
data
.
Showing
rather
the
data
corresponds
for
training
patterns
or
not
.
Hebb training algorithm
w
y
x
j
j
According
to
Hebb
:
If
we
will
reformulate
the
task
as
the
optimization
task
we
will
get
the
property
of
such
neuron
and
rule
how
to
define
functional
we
have
to
min
:
w
w
w
x
w
x
E
E
y
,
,
1
2
2
1
2
2
NB!
If
we
wont
to
have
minimum
of
the
E
than
we
will
have
an
output
amplitude
equals
to
infinity
Oja training rule
x
w
1
The member interfering was added to stop unlimited growth of
weights
Rule Oja maximizes sensitivity of an output neuron at the limited amplitude of
weights. It is easy to be convinced of it, having equated average change of
weights to zero.
Having increased then the right part of equality on w. We are convinced, that in
balance
Thus, weights of trained neuron are located on hyper sphere
:
At training on Oja, a vector of weights
settles down on hyper sphere, In a
direction maximizing Projection of
input vectors.
j j j
w y x y w
2
1 0
y
2
w
1.
w
SUMMARY:
Neuron is trying to reproduce the value of it’s input for
known output
.
It means that it’s trying to maximize the sensitivity
of it’s output neurons

indicators
for many dimensional input
information
,
doing compression this way.
Oja training rule
y
w
y
w
k
kj
k
k
ij
k
i
1
NB! The output of the Oja output layer is the
linear combination of main components. If you
want to receive main components you should
change sum of all outputs:
The analysis of main components
x
y
w
x
w
x
i
ij
j
j
d
ij
j
i
j
d
1
1
w
x
i
m
1
,
.
.
.
,
Lets
say
that
we
have
d

dimensional
data
we are training
m
linear neurons
:
.
x
1
x
d
y
w
x
i
ij
j
j
d
1
We
want
an
amplitude
to
be
independent
indicators
of
all
output
neurons
,
fully
reflecting
information
about
many

dimensional
data
we
have
.
THE TASK IS
:
The requirement:
Neurons must interact somehow (if we will train them
independently we will receive the same result for all of them)
In simple case:
Lets
take
perceptron
with
linear
neuron
for
hidden
layer
,
in
which
the
number
of
inputs
and
outputs
equals
,
and
the
weights
with
the
same
indexes
in
both
layers
are
the
same
.
Lets
try
to
teach
ANN
to
reproduce
the
input
on
the
output
.
Training
rule
therefore
:
w
x
x
x
w
y
y
y
~
x
x
d
1
.
.
.
~
.
.
.
~
x
x
d
1
Looks like Oya training rule!
Self training layer:
In
our
formulation
the
training
of
separate
neuron
,
is
trying
to
reproduce
the
inputs
according
to
its
outputs
.
Generalizing
this
note
,
it
is
logical
to
suggest
a
rule
,
according
to
which
the
value
of
outputs
restoring
according
to
whole
output
information
.
Doing
this
way
we
can
get
Oja
training
rule
for
one
layer
network
:
w
y
x
x
y
x
y
w
ij
i
j
j
i
j
k
kj
k
~
x
x
d
1
.
.
.
~
.
.
.
~
x
x
d
1
The
hidden
layer
of
such
ANN
,
the
same
as
Oya
layer
,
makes
optimal
coding
of
input
data
,
and
contains
maximum
variety
of
data
according
to
existing
restrictions
.
Example:
Lets change activation function on the sigmoid in the training rule
:
w
x
w
i
i
k
k
k
f
y
f
y
Brings new property
(Oja, et al, 1991).
Such algorithm
,
in particular
,
was used for the decomposition
of mixed signals with an unknown way
(
i
.
e
.
blind signal separation
).
For example this task we have when we
want to separate human voice and noise
.
Competition of neurons
:
the winner gets all
#:
i
i
i i
w x w x
x
1
x
d
y
w
x
i
ij
j
j
d
1
i i k k
k
y y
w x w
Basis algorithm
The training of competition layer remains constant
:
The winner
:
1
i i
i
if i i
w w x w x
i
# of neuron winner
The winner will be the neuron,
which has the maximum response
Training of winner
:
i i
w x w
1,0,
i
i
y y i i
The winner takes away not all
One of variants of updating of a base rule of training of a competitive layer
Consists in training not only the neuron

winner, but also its "neighbors", though and with
In the smaller speed. Such approach

"pulling up" of the nearest to the winner neuron

It is applied in topographical
Kohonen
cards
#:
min
i
i
i
i i
w x w x
( 1) ( 1) ( ),( ) ( )
t t t t t t
i i i i
w w w i i x w
,
t
i i
Function of the neighborhood is equal to unit for the neuron


winner with an index And gradually falls down at removal
from the neuron

winner
i
Training on Kohonen reminds stretching an elastic grid of prototypes on
Data file from training sample
a
a
exp
2
2
Schematic representation
of self

organizing
network
Methodology
of self

organizing
cards
Neurons in the target layer are ordered and
correspond to cells of a bi

dimensional card
which can be painted by a principle of
affinity of attributes
Training on Kohonen reminds stretching an elastic grid of prototypes on
Data file from training sample
x
i
The convenient tool of visualization
Data is coloring topographical
Cards, it is similar to how it do on
Usual geographical cards. All
attribute of data generates the coloring
Cells of a card

on size of average value
This attribute at the data who have got in given
Cell.
Visualization a topographical card, Induced by i

th
component of entrance data
Having collected together cards of all interesting
Us of attributes, we shall receive topographical
The atlas, giving integrated representation
About structure of multivariate data.
Classified SOM
for
NASDAQ
100
index for the period
from
10

Nov

19
97
till
27

Aug

20
01
Methodology
of self

organizing
cards
Complexity of the algorithm
When it’s better to use reducing of dimension
,
and when
–
quantifying of the input
information
?
Reducing the dim
2
1
~
C PW
Number of training
patterns
# of operations
:
quantifying
W dm
K d m
4 2
1
C Pd K
number of syn weights
of 1 layer ANN
with
d inputs
&
m
output neurons
Compression coef
:
2
log
K db m
Compression coef
(
b
–
capacity data
)
# of operations:
2
~
C PW
2
2
db K
C Pd
Complexity
:
Complexity
:
P
2
2
3
1
2
~
db K
C
K
C d
With the same compression coef
:
JPEG example
d
8
8
64
2
256
8
b
8
d
b
2
1
Image is divided on to
8
x
8
pixels
,
which should be input vectors
,
we want to reduce
.
In our case
Lets propose that image contains
gradation of the gray accuracy
of the represented data
But if d=64x64 than K>10
3
Any questions?
Comments 0
Log in to post a comment