Cellular Neural Network Simulation and Modeling
(
Oroszi Balázs
)
Introduction
Real time image and video processing is becoming
a more and more demanding need for every day
use, not just in various industries, and surveillance
monitoring systems, but also i
n the consumer
market. One of the main challenges for
microprocessor manufacturers in the near future
will be to build
efficient processors and
infrastructure for the real time handling of images
and videos (or time signals in general), coming
from spatial
ly distributed sources.
Because both of these tasks are strictly related to
spatio

temporal computing, a great effort was taken to create supercomputers able to
perform high performance spatio

temporal calculations. From one point of view these
operations
are need to be performed in real time. While from another one, extremely
high accuracy is often not required. From this perspective the possibility to use
analog computation directly on signal flows instead of doing digital computation on
bits arises. Cell
ular Nonlinear Networks (CNNs) fully realize this concept by
introducing a new paradigm for analog signal processing.
This processing aspect (the highly parallel and analog nature of the CNN) combined
with programmability leads to the concept of the so

cal
led analogic computer. Based
on the huge number of applications developed, CNNs can be considered as a
paradigm for solving nonlinear spatio

temporal wave equations (a very difficult task
in the digital world) within the range of microsecs.
In 1988 papers
from Leon O. Chua introduced the concept of the Cellular Neural
Network. CNNs can be defined as “
2D or 3D arrays of mainly locally connected
nonlinear dynamical systems called cells, whose dynamics are functionally
determined by a small set of parameters w
hich control the cell interconnection
strength
” (Chua). These parameters determine the connection pattern, and are
collected into the so

called
cloning templates
, which, once determined, define the
processing of the whole structure.
In this document, the m
ain characteristics of the CNN are presented, and the
problems of simulating the analog CNN architecture in a digital environment are
discussed along with a ready

to

use computer program, simulating the functional
behavior of the CNN, capable of performing
some of the most common image
processing tasks realized with CNN cloning templates.
–
2
–
Basic characteristics of the CNN
The basic idea of the CNN is to use an array of nonlinear dynamic circuits, called
cells, to process large amounts of information in real
time. This concept was inspired
by the Cellular Automata and the Neural Network architectures. This new
architecture is able to perform time

consuming tasks, such as image processing, being
at the same time, suitable for VLSI implementation. The original C
NN model was
introduced in 1988. Each cell is characterized by
u
ij
,
y
ij
and
x
ij
being the
input
, the
output
and the
state
variable of the cell respectively.
The
output
is related to the
state
by the nonlinear equation:
y
ij
=
f
(
x
ij
) = 0.5 (
x
ij
+ 1
–

x
ij
–
1)
The CNN can be defined as an
M x N
type array of identical cells arranged in a
rectangular grid. Each cell is locally connected to its 8 nearest surrounding neighbors.
The state transition of neuron (i, j) is governed by the following differential
equation:
Where C(i,j) represents the
neuron
at column i, row j, S
r
(i,j) represents the neurons in
the
radius r
of the neuron C(i,j), and z
i,j
is the
threshold
(bias) of the cell C(i,j).
The coefficients
A
(
i
,
j,
k
,
l
) and
B
(
i
,
j,
k
,
l
) are known as the
cloning
templates
. In
general, they are
nonlinear
,
time

and
space variant operators
. If they are considered
linear
,
time

and
space invariant
, they can simply be represented by matrices. This
means, that they are constant throughout t
he whole processing of the CNN. The A
template controls the
feedback of the output
of the neighboring cells (A is called the
feedback template
), while the B template controls the
feed

forward of the input
of the
neighboring cells (B is called the
control t
emplate
).
–
3
–
Operation of the CNN
We shall investigate the operation of the CNN in the special case, when the cloning
templates are considered linear, time

and space invariant, and we shall focus on the
task of image processing and enhancement, as that is th
e primary objective of this
document.
In the field of image processing, though the operation of the CNN in time is
continuous, its distribution in space is discrete, so each cell essentially holds one
pixel of the image. There are 3 pictures present during
the workings of the CNN: the
input (U)
, the
state (X)
and the
output(Y)
.
At the initialization stage, an input image is fed (or uploaded) into the CNN through
its input matrix. Optionally, an initial state may also be set, if desired. After this, the
stat
e

transition stage takes place (as described in the previous section), in which the
CNN will converge into a stable state. This convergence takes place in the range of
microsecs, which makes the CNN a remarkably powerful and extremely fast image
processing
architecture. At the end of the state

transition, the final output is retrieved
according to the output

state nonlinear equation, as described above.
What the CNN essentially performs is a
mapping
from the input image U with
cloning templates A and B, to
the output image Y.
Typically, the cloning templates are of size 3x3. This size represents the case, when
only the directly surrounding cells are connected.
There is one more thing, that needs to be taken care of, and those are the boundary
conditions. Bou
ndary cells are those, which are at the edges of the CNN array.
Beyond them are so

called irregular or virtual cells. They are not part of the regular
CNN cells, so there is nothing uploaded into them in the initialization phase, also, the
state transition
is not applied for them as well. This requires, that the initial input and
output of these neurons must be assigned somehow. There are different methods for
this:
Dirichlet (fixed) conditions: the input and output of the virtual cells are set to some
cons
tant (typically 0).
Neumann (zero

flux) conditions: the input and output of the virtual cells are set to the
same as the neighboring boundary cells.
Toroid (periodic) conditions: the input and output of the virtual cells are set to the
same as the opposite
side boundary cells.
–
4
–
Modeling and simulation of the CNN architecture
The true processing capabilities of CNNs for high

speed parallel processing are only
fully exploited by dedicated VLSI hardware realizations. Typical CNN chips may
contain up to 200 tran
sistors per pixel. At the same time, industrial applications
require large enough grid sizes (around 100 x 100). Thus, CNN chip designers must
confront complexity levels larger than 10
6
transistors, most of them operating in
analogue mode.
Simulation plays
an important role in the design of the CNN cloning templates.
Therefore, it has to be fast enough to allow the design phase of various templates be
accomplished in reasonable time. At the same time, the simulation has to be accurate
enough, to reflect the
behavior of the analog circuitry correctly, and provide reliable
information to guide the designer in making the right decisions.
In practice, the simulation of the CNN involves a trade

off between accuracy and
computation time. On the one hand,
high

leve
l
simulation, which is focused on
emulating the
functional
behaviour, cannot reflect realistically the underlying
electronic circuitry. Their lack of detail makes them ill

suited for reliable IC
simulation. On the other hand, the SPICE

type
transistor

leve
l
simulators, although
very accurate, are barely capable of handling more than about 10
5
transistors and may
take several days of CPU time for circuit netlists containing about 10
6
transistors.
Hence, these low

level tools are ill

suited for simulating lar
ge CNN chips.
In a straightforward approach, high

level simulators might be configured and
programmed to achieve a more realistic simulation of the CNN hardware, but at the
expense of a considerable coding effort, and a great increase in CPU usage, resulti
ng
in slowdown of the simulation. Alternatively, the use of macromodels in SPICE

like
tools will decrease CPU time consumption. This results in a simplified but still
accurate description of the circuit, although the simulator core still has to handle the
whole network interconnection topology. Again, the limitations on computing power
make this approach inefficient when dealing with large CNN chips.
Therefore, it is necessary to bridge the gap between these approaches; that is, in order
to achieve the effi
cient simulation of CNN chips, an intermediate solution must be
found.
Even though this intermediate solution would give the best results from a design
point of view, it is beyond the scope of this document. In the rest of this writing we
shall focus on th
e
functional modeling
of the CNN architecture.
–
5
–
Functional modeling of the CNN architecture
The output of a CNN model simulation is the final state reached by the network after
evolving from an initial state under the influence of a specific input and bound
ary
conditions. The following block diagram shows the state

transition and output of a
single cell:
In the most general case, the final state of one cell can be described by the following
differential equation:
or
where dx(t)/dt is given as described in the previous sections.
As a closed form for the solution of the above equation cannot be given, it must be
integrated numerically.
For the simulation of such equations on a digital computer, they m
ust be mapped into
a discrete

time system that emulates the continuous

time behavior, has similar
dynamics and converges to the same final state. The error committed by this
emulation depends on the form of the discrete

time system selected, which in turn
depends on the choice of the method of integration, i. e. the way in which the integral
is calculated.
–
6
–
There is a wide variety of integration algorithms that can be used to perform this task.
However, only three of them are going
to be considered here. These methods are:
the explicit Euler’s formula:
,
the predictor

corrector algorithm:
where
,
and the fourth

order Runge

Kutta method:
where
From all of them, the Euler method is the fastest, but gives the least accurate
convergence behaviour, while Runge

Kutta gives the best results, however, much
slower.
In the Runge

Kutta algorithm, four auxiliary c
omponents (k1

k4) are computed.
These are auxiliary states, which are then averaged.
For applications targeting accuracy and robustness, undoubtedly Runge

Kutta would
be the method of choice. In our case, however, as the primary target is a fast, working
i
mplementation of a CNN simulator as an image processor, we shall choose the Euler
method.
–
7
–
Handling special cases for increasing performance
It is not uncommon within templates that extract local properties of the image (like
edge detectors) to use a fully
zero A template. I have discovered that handling this
special case, we can speed up processing dramatically.
Given A = 0, the state equation takes the following form:
(BU + Z) is constant, as U does not change during the proce
ss. Let: BU + Z = C
Using Euler integration:
The pattern can clearly be seen by now. In each new step
gets multiplied
by
so it’s power index increases. The remaining part is a geometric
series, so
the general equation of calculating the n

th state is:
Using the general formula of calculating the sum of a geometric series:
the sum of the above geometric series turns into:
So the state equation using this will be:
This result is of utmost importance regarding the speed of the simulation, because the
number of iterations that need to be performed is reduced to 1. We can get to the final
state
imme
diately
, given the U input, the B template and Z bias. As multiple
iterations through an image causes lots of non

cacheable memory accesses (which is
very slow), this improvement in the special case of A = 0 gives a huge boost in speed.
–
8
–
Realization of the
CNN simulator
To get from theory to practice, and to implement a fast, relatively correctly working
CNN simulator within a limited timeframe, that is capable of demonstrating some
real

life example candidates of practical CNN applications in the image proc
essing
field, the right combination of software and development tools was need to be
chosen. To speed up development, and to skip the time

consuming task of
implementing low

level image reading/writing/displaying code, the CNN simulator is
realized as an
A
visynth plugin
.
AviSynth (http://www.avisynth.org) is a powerful tool for video post

production. It
provides almost unlimited ways of editing and processing videos. AviSynth works as
a frameserver, providing instant editing without the need for temporary f
iles.
AviSynth itself does not provide a graphical user interface (GUI) but instead relies on
a script system that allows advanced non

linear editing. While this may at first seem
tedious and unintuitive, it is remarkably powerful and is a very good way to
manage
projects in a precise, consistent, and reproducible manner. Because text

based scripts
are human readable, projects are inherently self

documenting. The scripting language
is simple yet powerful, and complex filters can be created from basic operat
ions to
develop a sophisticated palette of useful and unique effects.
Avisynth uses a special programming language, designed specifically for video
processing. It is a scripting language, and it’s functions are implemented under

the

hood as C/C++ dynamic l
ink libraries (DLLs), which are called
plugins
. These
plugins expose an interface towards the scripting language, from which these
functions can be called. This allows very flexible use and parameterization
Avisynth itself is composed of a huge collection
of plugins. The CNN simulator itself
is also implemented as a plugin. This makes it possible to use it in Avisynth’s
scripting language, effectively and easily processing any type of image or video that
Avisynth can open. The programming language of choice
was C++, and the plugin
uses Avisynth’s compiler

independent C interface. The following parameters can be
set from within the scripting language:
–
A
template
–
B
template
–
Z
bias
–
X
initial state (including either initial image/clip or initial value)
–
U
initial
image, boundary conditions
–
Timestep
of Euler integration
–
Number of
iterations
to perform
–
Black
color value
–
White
color value
–
9
–
The plugin operates in the YUY2 colorspace, which is very well suited for grayscale
image processing. In the YUY2 colorspace, pix
el data is separated to chrominance
and luminance information and is arranged in the following way (each component
being 8 bits long):
[Y][Cr][Y][Cb]
This block represents 2 pixels. Horizontal color resolution is half of that of RGB24,
but luminance resolu
tion is unchanged.
Each pixel’s luminance ranges between 0 and 255 (8 bit grayscale). This range is then
converted down to a custom range (represented by 4

byte floating point values)
specified by the
Black
and
White
color value script function parameters.
Care needs
to be taken to adjust these parameters correctly, as each CNN template can operate in
very specific ranges, and it is the template designer’s task to specify the value range,
when designing the templates.
Here is an example script, that does si
mple black and white edge detection (also
requires adaptive B/W conversion plugin, that accompanies the CNN plugin):
LoadCPlugin("C:
\
Program Files
\
AviSynth 2.5
\
plugins
\
c
\
cnn.dll")
LoadCPlugin("C:
\
Program Files
\
AviSynth 2.5
\
plugins
\
c
\
adaptivebw.dll")
Dire
ctShowSource("C:
\
video.mpg")
#

BW edge

AdaptiveBW()
CNN(
\
"A",
\
0,0,0,
\
0,0,0,
\
0,0,0,
\
"B",
\

1,

1,

1,
\

1, 8,

1,
\

1,

1,

1,
\
Z=

1.0,
\
XInitValue=0,
\
YInitValue=0,
\
timestep=1,
\
iterations=1,
\
black=

1,
\
whi
te=1
\
)
Real

time processing of captured video:
To achieve direct real

time processing of captured video, the excellent ffdshow raw
video DirectShow filter can be used with its built

in Avisynth post

processing
capability.
–
10
–
Speed, accuracy and limitations
of the CNN simulator plugin
As mentioned earlier, every CNN simulator has to make a decision between speed
and accuracy. During the development of the CNN Avisynth plugin, the primary
target was traditional PCs, nowadays available in every household. As s
uch, no
specific hardware was expected to be available to aid the processing of the CNN
simulator. This means, that the simulator itself is not as robust as other special CNN
simulators, but it is decently accurate and fast. Selecting the right timestep an
d
iteration count for a given template, very good and accurate results can be achieved.
I cannot stress enough, that the most important parameters of the CNN simulator are
the timestep and iteration count parameters (along the cloning templates of course).
For fine results, choose a relatively small timestep (around 0.1

0.2) and a high
iteration count (around 20

40). This will most possibly give good results, but
processing speed will drop to a crawl. For quasi real

time performance (15

25 FPS)
set the iter
ation count to 1

5, and increase the timestep parameter accordingly. Also,
make sure not to process images with too high resolutions. Resolutions around
400x300 with an iteration count of 1

5 should give good performance.
Is has to be noted, that when the
A template is fully 0, the modified form of the state
equation is used (as presented in
Handling special cases for increasing performance
).
So the
iterations
parameter no longer specifies the number of iterations, as that is
always 1 in this case, but is u
sed in a different manner. See the noted section for
details.
Though the code is decently optimized, there is a great deal of floating point
calculations required, as well as lots of memory accesses, which has a huge impact on
performance.
The simulator co
rrectly simulates the state

transition rule using the Euler integration
method. Though it is the fastest, it may not give the most accurate result. This
limitation comes from the design choice of speed.
An up

to

date version of this document can be found a
t:
http://digitus.itk.ppke.hu/~oroba/cnn/
along with other useful information and the CNN simulator program itself.
Make sure to check out the following directories:
http://digitus.itk.ppke.hu/~oroba/cnn/documentation/
(
ppt presentation!)
http://digitus.it
k.ppke.hu/~oroba/cnn/pictures/
(
screenshots!)
References:
Cellular Neural Networks: A paradigm for Nonlinear Spatio

Temporal Processing by
Luigi Fortuna, Paolo Arena, Dávid Bálya and Ákos Zarándy
Behavioral Modeling and Simulation of CNN Chips by R. Carmo
na, R.
Domínguez
Castro, S. Espejo and A. Rodríguez

Vázquez
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο