Cellular Neural Network Simulation and Modeling

foulchilianΤεχνίτη Νοημοσύνη και Ρομποτική

20 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

66 εμφανίσεις

Cellular Neural Network Simulation and Modeling

(
Oroszi Balázs
)

Introduction

Real time image and video processing is becoming
a more and more demanding need for every day
use, not just in various industries, and surveillance
monitoring systems, but also i
n the consumer
market. One of the main challenges for
microprocessor manufacturers in the near future
will be to build
efficient processors and
infrastructure for the real time handling of images
and videos (or time signals in general), coming
from spatial
ly distributed sources.

Because both of these tasks are strictly related to
spatio
-
temporal computing, a great effort was taken to create supercomputers able to
perform high performance spatio
-
temporal calculations. From one point of view these
operations
are need to be performed in real time. While from another one, extremely
high accuracy is often not required. From this perspective the possibility to use
analog computation directly on signal flows instead of doing digital computation on
bits arises. Cell
ular Nonlinear Networks (CNNs) fully realize this concept by
introducing a new paradigm for analog signal processing.

This processing aspect (the highly parallel and analog nature of the CNN) combined
with programmability leads to the concept of the so
-
cal
led analogic computer. Based
on the huge number of applications developed, CNNs can be considered as a
paradigm for solving nonlinear spatio
-
temporal wave equations (a very difficult task
in the digital world) within the range of microsecs.

In 1988 papers
from Leon O. Chua introduced the concept of the Cellular Neural
Network. CNNs can be defined as “
2D or 3D arrays of mainly locally connected
nonlinear dynamical systems called cells, whose dynamics are functionally
determined by a small set of parameters w
hich control the cell interconnection
strength
” (Chua). These parameters determine the connection pattern, and are
collected into the so
-
called
cloning templates
, which, once determined, define the
processing of the whole structure.

In this document, the m
ain characteristics of the CNN are presented, and the
problems of simulating the analog CNN architecture in a digital environment are
discussed along with a ready
-
to
-
use computer program, simulating the functional
behavior of the CNN, capable of performing

some of the most common image
processing tasks realized with CNN cloning templates.



2



Basic characteristics of the CNN

The basic idea of the CNN is to use an array of nonlinear dynamic circuits, called
cells, to process large amounts of information in real
time. This concept was inspired
by the Cellular Automata and the Neural Network architectures. This new
architecture is able to perform time
-
consuming tasks, such as image processing, being
at the same time, suitable for VLSI implementation. The original C
NN model was
introduced in 1988. Each cell is characterized by
u
ij
,
y
ij

and
x
ij

being the
input
, the
output

and the
state

variable of the cell respectively.

The
output

is related to the
state

by the nonlinear equation:

y
ij

=
f
(
x
ij
) = 0.5 (|
x
ij

+ 1|


|
x
ij



1|)

The CNN can be defined as an
M x N

type array of identical cells arranged in a
rectangular grid. Each cell is locally connected to its 8 nearest surrounding neighbors.

The state transition of neuron (i, j) is governed by the following differential
equation:


Where C(i,j) represents the
neuron

at column i, row j, S
r
(i,j) represents the neurons in
the
radius r

of the neuron C(i,j), and z
i,j

is the
threshold

(bias) of the cell C(i,j).

The coefficients
A
(
i
,
j,

k
,
l
) and
B
(
i
,

j,

k
,

l
) are known as the
cloning
templates
. In
general, they are
nonlinear
,
time
-

and
space variant operators
. If they are considered
linear
,
time
-

and
space invariant
, they can simply be represented by matrices. This
means, that they are constant throughout t
he whole processing of the CNN. The A
template controls the
feedback of the output

of the neighboring cells (A is called the
feedback template
), while the B template controls the
feed
-
forward of the input

of the
neighboring cells (B is called the
control t
emplate
).



3



Operation of the CNN

We shall investigate the operation of the CNN in the special case, when the cloning
templates are considered linear, time
-

and space invariant, and we shall focus on the
task of image processing and enhancement, as that is th
e primary objective of this
document.

In the field of image processing, though the operation of the CNN in time is
continuous, its distribution in space is discrete, so each cell essentially holds one
pixel of the image. There are 3 pictures present during

the workings of the CNN: the
input (U)
, the
state (X)

and the
output(Y)
.

At the initialization stage, an input image is fed (or uploaded) into the CNN through
its input matrix. Optionally, an initial state may also be set, if desired. After this, the
stat
e
-
transition stage takes place (as described in the previous section), in which the
CNN will converge into a stable state. This convergence takes place in the range of
microsecs, which makes the CNN a remarkably powerful and extremely fast image
processing

architecture. At the end of the state
-
transition, the final output is retrieved
according to the output
-
state nonlinear equation, as described above.

What the CNN essentially performs is a
mapping

from the input image U with
cloning templates A and B, to
the output image Y.

Typically, the cloning templates are of size 3x3. This size represents the case, when
only the directly surrounding cells are connected.

There is one more thing, that needs to be taken care of, and those are the boundary
conditions. Bou
ndary cells are those, which are at the edges of the CNN array.
Beyond them are so
-
called irregular or virtual cells. They are not part of the regular
CNN cells, so there is nothing uploaded into them in the initialization phase, also, the
state transition

is not applied for them as well. This requires, that the initial input and
output of these neurons must be assigned somehow. There are different methods for
this:

Dirichlet (fixed) conditions: the input and output of the virtual cells are set to some
cons
tant (typically 0).

Neumann (zero
-
flux) conditions: the input and output of the virtual cells are set to the
same as the neighboring boundary cells.

Toroid (periodic) conditions: the input and output of the virtual cells are set to the
same as the opposite

side boundary cells.



4



Modeling and simulation of the CNN architecture

The true processing capabilities of CNNs for high
-
speed parallel processing are only
fully exploited by dedicated VLSI hardware realizations. Typical CNN chips may
contain up to 200 tran
sistors per pixel. At the same time, industrial applications
require large enough grid sizes (around 100 x 100). Thus, CNN chip designers must
confront complexity levels larger than 10
6

transistors, most of them operating in
analogue mode.

Simulation plays

an important role in the design of the CNN cloning templates.
Therefore, it has to be fast enough to allow the design phase of various templates be
accomplished in reasonable time. At the same time, the simulation has to be accurate
enough, to reflect the

behavior of the analog circuitry correctly, and provide reliable
information to guide the designer in making the right decisions.

In practice, the simulation of the CNN involves a trade
-
off between accuracy and
computation time. On the one hand,
high
-
leve
l
simulation, which is focused on
emulating the
functional

behaviour, cannot reflect realistically the underlying
electronic circuitry. Their lack of detail makes them ill
-
suited for reliable IC
simulation. On the other hand, the SPICE
-
type
transistor
-
leve
l
simulators, although
very accurate, are barely capable of handling more than about 10
5

transistors and may
take several days of CPU time for circuit netlists containing about 10
6

transistors.
Hence, these low
-
level tools are ill
-
suited for simulating lar
ge CNN chips.

In a straightforward approach, high
-
level simulators might be configured and
programmed to achieve a more realistic simulation of the CNN hardware, but at the
expense of a considerable coding effort, and a great increase in CPU usage, resulti
ng
in slowdown of the simulation. Alternatively, the use of macromodels in SPICE
-
like
tools will decrease CPU time consumption. This results in a simplified but still
accurate description of the circuit, although the simulator core still has to handle the
whole network interconnection topology. Again, the limitations on computing power
make this approach inefficient when dealing with large CNN chips.

Therefore, it is necessary to bridge the gap between these approaches; that is, in order
to achieve the effi
cient simulation of CNN chips, an intermediate solution must be
found.

Even though this intermediate solution would give the best results from a design
point of view, it is beyond the scope of this document. In the rest of this writing we
shall focus on th
e
functional modeling

of the CNN architecture.



5



Functional modeling of the CNN architecture

The output of a CNN model simulation is the final state reached by the network after
evolving from an initial state under the influence of a specific input and bound
ary
conditions. The following block diagram shows the state
-
transition and output of a
single cell:

In the most general case, the final state of one cell can be described by the following
differential equation:


or

where dx(t)/dt is given as described in the previous sections.

As a closed form for the solution of the above equation cannot be given, it must be
integrated numerically.

For the simulation of such equations on a digital computer, they m
ust be mapped into
a discrete
-
time system that emulates the continuous
-
time behavior, has similar
dynamics and converges to the same final state. The error committed by this
emulation depends on the form of the discrete
-
time system selected, which in turn
depends on the choice of the method of integration, i. e. the way in which the integral
is calculated.






6



There is a wide variety of integration algorithms that can be used to perform this task.
However, only three of them are going
to be considered here. These methods are:

the explicit Euler’s formula:

,

the predictor
-
corrector algorithm:


where
,

and the fourth
-
order Runge
-
Kutta method:


where

From all of them, the Euler method is the fastest, but gives the least accurate
convergence behaviour, while Runge
-
Kutta gives the best results, however, much
slower.

In the Runge
-
Kutta algorithm, four auxiliary c
omponents (k1
-
k4) are computed.
These are auxiliary states, which are then averaged.

For applications targeting accuracy and robustness, undoubtedly Runge
-
Kutta would
be the method of choice. In our case, however, as the primary target is a fast, working
i
mplementation of a CNN simulator as an image processor, we shall choose the Euler
method.



7



Handling special cases for increasing performance

It is not uncommon within templates that extract local properties of the image (like
edge detectors) to use a fully
zero A template. I have discovered that handling this
special case, we can speed up processing dramatically.

Given A = 0, the state equation takes the following form:


(BU + Z) is constant, as U does not change during the proce
ss. Let: BU + Z = C

Using Euler integration:

The pattern can clearly be seen by now. In each new step

gets multiplied
by

so it’s power index increases. The remaining part is a geometric

series, so
the general equation of calculating the n
-
th state is:


Using the general formula of calculating the sum of a geometric series:

the sum of the above geometric series turns into:


So the state equation using this will be:


This result is of utmost importance regarding the speed of the simulation, because the
number of iterations that need to be performed is reduced to 1. We can get to the final
state
imme
diately
, given the U input, the B template and Z bias. As multiple
iterations through an image causes lots of non
-
cacheable memory accesses (which is
very slow), this improvement in the special case of A = 0 gives a huge boost in speed.



8



Realization of the
CNN simulator

To get from theory to practice, and to implement a fast, relatively correctly working
CNN simulator within a limited timeframe, that is capable of demonstrating some
real
-
life example candidates of practical CNN applications in the image proc
essing
field, the right combination of software and development tools was need to be
chosen. To speed up development, and to skip the time
-
consuming task of
implementing low
-
level image reading/writing/displaying code, the CNN simulator is
realized as an
A
visynth plugin
.

AviSynth (http://www.avisynth.org) is a powerful tool for video post
-
production. It
provides almost unlimited ways of editing and processing videos. AviSynth works as
a frameserver, providing instant editing without the need for temporary f
iles.

AviSynth itself does not provide a graphical user interface (GUI) but instead relies on
a script system that allows advanced non
-
linear editing. While this may at first seem
tedious and unintuitive, it is remarkably powerful and is a very good way to

manage
projects in a precise, consistent, and reproducible manner. Because text
-
based scripts
are human readable, projects are inherently self
-
documenting. The scripting language
is simple yet powerful, and complex filters can be created from basic operat
ions to
develop a sophisticated palette of useful and unique effects.

Avisynth uses a special programming language, designed specifically for video
processing. It is a scripting language, and it’s functions are implemented under
-
the
-
hood as C/C++ dynamic l
ink libraries (DLLs), which are called
plugins
. These
plugins expose an interface towards the scripting language, from which these
functions can be called. This allows very flexible use and parameterization

Avisynth itself is composed of a huge collection
of plugins. The CNN simulator itself
is also implemented as a plugin. This makes it possible to use it in Avisynth’s
scripting language, effectively and easily processing any type of image or video that
Avisynth can open. The programming language of choice

was C++, and the plugin
uses Avisynth’s compiler
-
independent C interface. The following parameters can be
set from within the scripting language:



A

template



B

template



Z

bias



X

initial state (including either initial image/clip or initial value)



U

initial

image, boundary conditions



Timestep

of Euler integration



Number of
iterations

to perform



Black

color value



White

color value




9



The plugin operates in the YUY2 colorspace, which is very well suited for grayscale
image processing. In the YUY2 colorspace, pix
el data is separated to chrominance
and luminance information and is arranged in the following way (each component
being 8 bits long):

[Y][Cr][Y][Cb]

This block represents 2 pixels. Horizontal color resolution is half of that of RGB24,
but luminance resolu
tion is unchanged.

Each pixel’s luminance ranges between 0 and 255 (8 bit grayscale). This range is then
converted down to a custom range (represented by 4
-
byte floating point values)
specified by the
Black

and
White

color value script function parameters.

Care needs
to be taken to adjust these parameters correctly, as each CNN template can operate in
very specific ranges, and it is the template designer’s task to specify the value range,
when designing the templates.

Here is an example script, that does si
mple black and white edge detection (also
requires adaptive B/W conversion plugin, that accompanies the CNN plugin):


LoadCPlugin("C:
\
Program Files
\
AviSynth 2.5
\
plugins
\
c
\
cnn.dll")

LoadCPlugin("C:
\
Program Files
\
AviSynth 2.5
\
plugins
\
c
\
adaptivebw.dll")


Dire
ctShowSource("C:
\
video.mpg")


#
---

BW edge
---

AdaptiveBW()

CNN(


\

"A",


\

0,0,0,


\

0,0,0,


\

0,0,0,


\

"B",


\

-
1,
-
1,
-
1,


\

-
1, 8,
-
1,


\

-
1,
-
1,
-
1,


\

Z=
-
1.0,


\

XInitValue=0,


\

YInitValue=0,


\

timestep=1,


\

iterations=1,


\

black=
-
1,


\

whi
te=1

\

)


Real
-
time processing of captured video:

To achieve direct real
-
time processing of captured video, the excellent ffdshow raw
video DirectShow filter can be used with its built
-
in Avisynth post
-
processing
capability.



10



Speed, accuracy and limitations

of the CNN simulator plugin

As mentioned earlier, every CNN simulator has to make a decision between speed
and accuracy. During the development of the CNN Avisynth plugin, the primary
target was traditional PCs, nowadays available in every household. As s
uch, no
specific hardware was expected to be available to aid the processing of the CNN
simulator. This means, that the simulator itself is not as robust as other special CNN
simulators, but it is decently accurate and fast. Selecting the right timestep an
d
iteration count for a given template, very good and accurate results can be achieved.

I cannot stress enough, that the most important parameters of the CNN simulator are
the timestep and iteration count parameters (along the cloning templates of course).

For fine results, choose a relatively small timestep (around 0.1
-
0.2) and a high
iteration count (around 20
-
40). This will most possibly give good results, but
processing speed will drop to a crawl. For quasi real
-
time performance (15
-
25 FPS)
set the iter
ation count to 1
-
5, and increase the timestep parameter accordingly. Also,
make sure not to process images with too high resolutions. Resolutions around
400x300 with an iteration count of 1
-
5 should give good performance.

Is has to be noted, that when the
A template is fully 0, the modified form of the state
equation is used (as presented in
Handling special cases for increasing performance
).
So the
iterations

parameter no longer specifies the number of iterations, as that is
always 1 in this case, but is u
sed in a different manner. See the noted section for
details.

Though the code is decently optimized, there is a great deal of floating point
calculations required, as well as lots of memory accesses, which has a huge impact on
performance.

The simulator co
rrectly simulates the state
-
transition rule using the Euler integration
method. Though it is the fastest, it may not give the most accurate result. This
limitation comes from the design choice of speed.

An up
-
to
-
date version of this document can be found a
t:

http://digitus.itk.ppke.hu/~oroba/cnn/

along with other useful information and the CNN simulator program itself.

Make sure to check out the following directories:

http://digitus.itk.ppke.hu/~oroba/cnn/documentation/

(
ppt presentation!)

http://digitus.it
k.ppke.hu/~oroba/cnn/pictures/


(
screenshots!)

References:

Cellular Neural Networks: A paradigm for Nonlinear Spatio
-
Temporal Processing by
Luigi Fortuna, Paolo Arena, Dávid Bálya and Ákos Zarándy

Behavioral Modeling and Simulation of CNN Chips by R. Carmo
na, R.

Domínguez
Castro, S. Espejo and A. Rodríguez
-
Vázquez