Neural Networks
5MARKS
1)The term
neural network
was traditionally used to refer to a network or circuit of
biological
neurons
.
[1]
The modern usage of the term often refers to
artificial neural networks
, wh
ich are
composed of
artificial neurons
or nodes. Thus the term has two distinct usages:
1.
Biological neural networks
are made up of real biological neurons that are connected or
functionally related in a
nervous system
. In the field of
neuroscience
, they are often
identified as groups of neurons that perform a specific physiological function in
laboratory analysis.
2.
Artificial neural networks
are composed of interconnecting artificial neurons
(programming constructs that mimic the properties of biological neurons). Artificial
neural networks may either be used to gain an und
erstanding of biological neural
networks, or for solving artificial intelligence problems without necessarily creating a
model of a real biological system. The real, biological nervous system is highly complex:
artificial neural network algorithms attempt
to abstract this complexity and focus on what
may hypothetically matter most from an information processing point of view. Good
performance (e.g. as measured by good predictive ability, low generalization error), or
performance mimicking animal or human er
ror patterns, can then be used as one source
of evidence towards supporting the hypothesis that the abstraction really captured
something important from the point of view of information processing in the brain.
Another incentive for these abstractions is t
o reduce the amount of computation required
to simulate artificial neural networks, so as to allow one to experiment with larger
networks and train them on larger data sets.
2) Neural networks and artificial intelligence
A
neural network
(NN), in the case of artificial neurons called
artificial neural network
(ANN)
or
simulated neural network
(SNN), is an interconnected group of natural or
artificial neu
rons
that uses a
mathematical or computational model
for
information pro
cessing
based on a
connectionistic
approach to
computation
. In most cases an ANN is an
adaptive system
that
changes its structure based on external or internal information that flows through the network.
In more practical terms neural networks are
non

linear
statistical
data modeling
or
decision
making
tools. They can be used to model complex relationships between inputs and outputs or to
fi
nd patterns
in data.
However, the paradigm of neural networks

i.e.,
implicit
, not
explicit
, learning is stressed

seems more to correspond to some kind of
natural intelligence
than to the traditional symbol

based
Artificial Intelligence
, which would st
ress, instead, rule

based learning.
Background
An
artificial neural network
involves a network of simple processing elements (
artificial
neurons
) which can exhibit complex global behavior, determined by the connections between the
processing elements and element parameters. Artificial neurons were
first proposed in 1943 by
Warren McCulloch
, a neurophysiologist, and
Walter Pit
ts
, a logician, who first collaborated at
the
University of Chicago
.
[19]
One classic
al type of artificial neural network is the recurrent
Hopfield net
.
In a neural network model simple
nodes
(which can be called by a number of names, including
"neurons", "neurodes", "Processing Elements" (PE) and "units"), are connected together to form
a network of nodes
—
hence the term "neural network". While a neural network does
not have to
be adaptive
per se
, its practical use comes with algorithms designed to alter the strength
(weights) of the connections in the network to produce a desired signal flow.
In modern
software implementations
of artificial neural networks the approach inspired by
biology has more or less been abandoned for a more practical approach based on statistics and
signal processing. In some of these systems, neu
ral networks, or parts of neural networks (such
as
artificial neurons
), are used as components in larger systems that combine both adaptive and
non

adaptive elements.
The
concept of a neural network appears to have first been proposed by
Alan Turing
in his 1948
paper "Intelligent Machinery".
Applications of natural and of artificial neural networks
T
he utility of artificial neural network models lies in the fact that they can be used to infer a
function from observations and also to use it. Unsupervised neural networks can also be used to
learn representations of the input that capture the salient cha
racteristics of the input distribution,
e.g., see the
Boltzmann machine
(1983), and more recently,
deep learning
algorithms, which can
implicitly learn the distribution function of the observed data. Learning in neural networks is
particularly useful in applications where the complexity of the data or task makes the design of
such functions by ha
nd impractical.
The tasks to which artificial neural networks are applied tend to fall within the following broad
categories:
Function approximation
, or
regression analysis
, including
time series prediction
and modeling.
Classification
, including
pattern
and sequence recognition, nove
lty detection and sequential
decision making.
Data processing
, including filtering, clustering,
blind signal separation
and compression.
Application areas of ANNs include system identification and control (vehicle control, process
control), game

playing and decision making (backgammon, chess, racing), pattern recognition
(radar sy
stems, face identification, object recognition), sequence recognition (gesture, speech,
handwritten text recognition), medical diagnosis, financial applications,
data mining
(or
know
ledge discovery in databases, "KDD"), visualization and
e

mail spam
filtering.
3)Artificial intelligence
(
AI
) is the
intelligence
of machines and the branch of
computer science
that aims to create it. AI textbooks d
efine the field as "the study and design of intelligent
agents"
[1]
where an
intelligent agent
is a system that perceives its environment and takes actions
that maximize its chances of success.
[2]
John McCarthy
, who coined the term in 1955,
[3]
defines
it as "the science and engineering of making intelligent machines."
[4]
AI research is highly technical and speci
alized, deeply divided into subfields that often fail to
communicate with each other.
[5]
Some of the division is due to social and cultural factors:
s
ubfields have grown up around particular institutions and the work of individual researchers. AI
research is also divided by several technical issues. There are subfields which are focussed on the
solution of specific
problems
, on one of several possible
approaches
, on the use of widely
differing
tools
and towards the accomplishment of particular
applications
. The central problems
of AI include such traits as reasoning, knowledge
, planning, learning, communication, perception
and the ability to move and manipulate objects.
[6]
General intelligence (or "
strong AI
") is still
among the field's long term goals.
[7]
Currently popular approaches include
statistical methods
,
computational intelligence
and
traditional symbolic AI
. There are an enormous number of tools
used in AI, including versions of
search and
mathematical optimization
,
logic
,
methods based on
proba
bility and economics
, and many others.
The field was founded on the claim that a central property of humans, intelligence
—
the
sapience
of
Homo sapiens
—
can be so precisely described that it can be simulated by a machine.
[8]
This
raises philosophical issues about the nature of the
mind
and the ethics of creating artificial
beings, issues which have been addressed by
myth
,
fiction
and
philosophy
since antiqui
ty.
[9]
Artificial intelligence has been the subject of optimism,
[10]
but has also suffered
setbacks
[11]
and,
today, has become an essential part of the technology
industry, providing the heavy lifting for
many of the most difficult problems in computer science.
[12
20 Marks
1)
Artificial intelligence
(
AI
) is the
intelligence
of machines and the branch of
computer science
that aims to create it. AI textbooks d
efine the field as "the
study and design of intelligent agents"
[1]
where an
intelligent agent
is a system
that perceives its environment and takes actions that maximize its chances of
success
Problems
The general problem of simulating (or creating) intelligence has been broken down into a
number of specific
sub

problems
. These consist of particular traits or capabilities that researchers
would like an intelligent system to display. The traits described below have received the mo
st
attention.
[6]
Deduction, reasoning, problem solving
Early AI researchers developed algorithms that imitated the step

by

step reasoning that humans
use w
hen they solve puzzles or make logical deductions.
[39]
By the late 1980s and '90s, AI
research had also developed highly successful methods for dealing with
uncertain
or incomplete
information, employing concepts from
probability
and economics.
[40]
For difficult problems, most of these algorithms can require enormous computational resources
–
most experience a "
combinatorial explosion
": the amount of memory or computer time required
becomes astronomical when the problem goes beyond a certain size. The search for more
efficient problem

solving algorithms is a high prio
rity for AI research.
[41]
Human beings solve most of their problems using fast, intuitive judgments rather than the
conscious, step

by

step deduction that
early AI research was able to model.
[42]
AI has made some
progress at imitating this kind of "sub

symbolic" problem sol
ving:
embodied agent
approaches
emphasize the importance of
sensorimotor
skills
to higher reasoning;
neural net
research
attempts to simulate the structures inside human and animal brains that give rise to this skill;
statistical approaches to AI
mimic the probabilistic nature of the human ability to guess.
Knowledge representation
Knowledge representation
[43]
and
knowledge enginee
ring
[44]
are central to AI research. Many of
the problems machines are expected to solve will require extensive knowledge about the world.
Among th
e things that AI needs to represent are: objects, properties, categories and relations
between objects;
[45]
situations, events, sta
tes and time;
[46]
causes and effects;
[47]
knowledge about
knowledge (what we know about what other people know);
[48]
and many other, less well
researched domains. A repres
entation of "what exists" is an
ontology
(borrowing a word from
traditional philosophy), of which the most general are called
upper ontologies
.
[49]
Among the most difficult problems in knowledge representation are:
Default reasoning
and the
qualification problem
Many of the things people know take the
form of "working assumptions." For example, if a bird
comes up in conversation, people typically picture an animal that is fist sized, sings, and flies.
None of these things are true about all birds.
John McCarthy
identified this problem in 1969
[50]
as the qualification problem
: for any commonsense rule that AI researchers care to represent,
there tend to be a huge number of exceptions. Almost nothing is simply true or false in the way
that abstract logic requires. AI research has explored a number of solutions to this problem.
[51]
The breadth of
commonsense kn
owledge
The number of atomic facts that the average person knows is astronomical. Research projects
that attempt to build a complete knowledge base of
commonsense
knowledge
(e.g.,
Cyc
)
require enormous amounts of laborious
ontological engineering
—
they must be built,
by hand,
one complicated concept at a time.
[52]
A major goal is to have the computer understand enough
concepts to be able to learn by
reading from sources like the internet, and thus be able to add to
its own ontology.
[
citation needed
]
The subsymbolic form of some
commonsense knowledge
Much of what people know is not represented as "facts" or "statements" that they could
express verbally. For example, a chess master will avoi
d a particular chess position because it
"feels too exposed"
[53]
or an art critic can take one look at a statue and instantly realize that it is
a fake.
[54]
These are intuitions or tendencies that are represented in the brain non

consciously
and sub

symbolically.
[55]
Knowledge like this informs, supports and provides a context for
symbolic, conscious knowledge. As with the related problem of sub

symbolic reasoning, it is
hoped that
situated AI
,
computational intelligence
, or
statistical AI
will provide ways to
represent this kind of knowledge.
[55]
Planning
Intelligent agents must be able to set goa
ls and achieve them.
[56]
They need a way to visualize the
future (they must have a representation of the state of the world and be able to make predictions
abou
t how their actions will change it) and be able to make choices that maximize the
utility
(or
"value") of the available choices.
[57]
In classical planning problems, the agent can assume that it is the only thing acting on the world
and it can be certain what the consequence
s of its actions may be.
[58]
However, if the agent is not
the only actor, it must periodically ascertain whether the world matches its predictions and
it
must change its plan as this becomes necessary, requiring the agent to reason under
uncertainty.
[59]
Multi

agent planning
uses the
cooperation
and competition of many agents to achieve a given
goal.
Emergent behavior
such as this is used by
evolutionary algorithms
and
swarm
intelligence
.
[60]
Learning
Machine learning
[61]
has been central to AI research from the beginnin
g.
[62]
In 1956, at the
original Dartmouth AI summer conference,
Ray Solomonoff
wrote a re
port on unsupervised
probabilistic machine learning: "An Inductive Inference Machine".
[63]
Unsupervised learning
is
the ability to find patterns in a stream of input.
Supervised learning
includes both
classification
and numerical
regression
. Classification is used
to determine what category something belongs
in, after seeing a number of examples of things from several categories. Regression is the attempt
to produce a function that describes the relationship between inputs and outputs and predicts how
the outputs sh
ould change as the inputs change. In
reinforcement learning
[64]
the agent is
rewarded for good responses and punished for bad ones. These can be analyzed in terms of
decision theory
, using concepts like
utility
. The mathematical analysis of machine learning
algorithms and their performance is a branch of
theoretical computer science
known as
computational learning theory
.
[65]
Natural language processing
Natural language processing
[66]
gives machines the ability to read and
understand the languages that humans speak. A sufficiently powerful natural
language processing system would enable
natural language user interfaces
and
the acquisition of knowledge directly from human

written sources, such as
Internet texts. Some straightforwar
d applications of natural language processing
include
information retrieval
(or
text
mining
) and
machine translation
.
[67
]
A common method of processing and extracting meaning from natural language is through
semantic indexing. Increases in processing speeds and the drop in the cost of data storage makes
indexing large volumes of abstractions of the users input much more eff
icient.
Motion and manipulation
The field of
robotics
[68]
is closely related to AI. Intellig
ence is required for robots to be able to
handle such tasks as object manipulation
[69]
and
navigation
, with sub

problems of
localization
(knowing where you are),
mapping
(learning what is around you) and
motion planning
(figuring
out how to get there).
[70]
Perception
Machine perception
[71]
is the ability to use input from sensors (such as cameras, microphones,
sonar and others more exotic) to deduce aspects of the world.
Computer vision
[72]
is the ability to
analyze visual input. A few selected subproblems are
speech
recognition
,
[73]
facial recognition
and
object recognition
.
[74]
Social intelligence
Affective computing is th
e study and development of systems and devices that can recognize,
interpret, process, and simulate human
affects
.
[76]
[77]
It is an interdisciplinary field spanning
computer sciences
,
psychology
, and
cognitive science
.
[78]
While the origins of the field may be
traced as far back as to early philosophical enquiries into
emo
tion
,
[79]
the more modern branch of
computer science originated with
Rosalind Picard
's
1995 paper
[80]
on affective computing.
[81]
[82]
A motivation for the research is the ability to simulate
empathy
. The machine should interpret
the e
motional state of humans and adapt its behaviour to them, giving an appropriate response
for those emotions.
Emotion and social skills
[83]
play two roles for an intelligent agent. First, it must be able to
predict the actions of others, by understanding their motives and emotional states. (This involves
elements of
game
theory
,
decision theory
, as well as the ability to model human emotions and the
perceptual skills to detect emotions.) Also, in an effort to facilitate
human

computer interaction
,
an intelligent machine might want to be able to
display
emotions

even if it does not actually
experience them itself

in order to appear sensiti
ve to the emotional dynamics of human
interaction.
Creativity
A sub

field of AI addresses
creativity
both theoretically (from a philosophical and psychological
perspective) and practi
cally (via specific implementations of systems that generate outputs that
can be considered creative, or systems that identify and assess creativity). Related areas of
computational research are
Artificial intuition
and
Artificial imagination
.
[
citation needed
]
General intelligence
Most researchers hope that their work will eventually be incorporated into a machine with
general
intelligence (known as
strong AI
), combining all the skills above and exceeding human
abilities at most or all of them.
[7]
A few believe that
anthropomorphic
features like
artificial
consciousness
or an
artificial brain
may be required for such a project.
[84]
[85]
Many of the problems above are considered
AI

complete
: to solve one problem, you must solve
them all. For example, ev
en a straightforward, specific task like
machine translation
requires that
the machine follow the author's argument (
reason
), know what is being talked about
(
k
nowledge
), and faithfully reproduce the author's intention (
social intelligence
).
Machine
translation
, therefore, is believed to be AI

complete: it may require
strong AI
to be done as well
as humans can do it.
2)Classical control theory
To avoid the problems
of the open

loop controller, control theory introduces
feedback
. A closed

loop
controller
uses feedback to control
states
or
outputs
of a
dynamical system
. Its name comes
from the information path in the system: process inputs (e.g.,
voltage
applied to an
electric
motor
) have an effect on the process outputs (e.g., speed or torque of the motor), which is
measured with
sensors
and processed by the
controller; the result (the control signal) is used as
input to the process, closing the loop.
Closed

loop controllers have the following advantages over
open

loop
controllers
:
disturbance rejection (such as unmeasured friction in a motor)
guaranteed performance even with
model
uncertainties, when the model structure does not
mat
ch perfectly the real process and the model parameters are not exact
unstable
processes can be stabilized
reduced sensitivity to parameter variations
improved reference tracking
performance
In some systems, closed

loop and open

loop control are used simultaneously. In such systems,
the open

loop control is termed
feedforward
and ser
ves to further improve reference tracking
performance.
A common closed

loop controller architecture is the
PID controller
.
Closed

loop transfer function
The output of the syste
m
y(t)
is fed back through a sensor measurement
F
to the reference value
r(t)
. The controller
C
then takes the error
e
(difference) between the reference and the output to
change the inputs
u
to the system under control
P
. This is shown in the figure. This
kind of
controller is a closed

loop controller or feedback controller.
This is called a single

input

single

output (
SISO
) control system;
MIMO
(i.e., Multi

Input

Multi

Output) systems, with more than one input/output, are common. In such cases variables a
re
represented through
vectors
instead of simple
scalar
values. For some
distributed parameter
systems
the vectors may be infinite

dimensional
(typically functions).
If we assume the controller
C
, the plant
P
, and the sensor
F
are
linear
and
time

invariant
(i.e.,
elements of their
transfer function
C(s)
,
P(s)
, and
F(s)
do not depend
on time), the systems above
can be analysed using the
Laplace transform
on the variables. This gives the following relations:
Solving for
Y
(
s
) in terms of
R
(
s
) giv
es:
The expression
is referred to as the
closed

loop transfer
function
of the system. The numerator is the forward (open

loop) gain from
r
to
y
, and the
denominator is one plus the gain in going around the feedback loop, the so

called loop gain. If
, i
.e., it has a large
norm
with each value of
s
, and if
, then
Y(s)
is approximately equal to
R(s)
and the output closely tracks the reference input.
PID controller
The
PID controller
is probably the most

used feedback control design.
PID
is an acronym for
Proportional

Integral

Derivative
, referring to the three terms operating on the erro
r signal to
produce a control signal. If
u(t)
is the control signal sent to the system,
y(t)
is the measured
output and
r(t)
is the desired output, and tracking error
, a PID controller
has the general form
The desired closed loop dynamics is obtained b
y adjusting the three parameters
,
and
, often iteratively by "tuning" and without specific knowledge of a plant model. Stability
can often be ensured using only the proportional term. The integral term permits the rejection of
a step disturbance (often
a striking specification in
process control
). The derivative term is used
to provide damping or shaping of the response. PID controllers are the most well established
class
of control systems: however, they cannot be used in several more complicated cases,
especially if
MIMO
systems are considered.
Applying Laplace transformation results in the transformed PID contro
ller equation
with the PID controller transfer function
Modern control theory
In contrast to the frequency domain analysis of the classical control theory, modern control
theory utilizes the time

domain
state space
representation, a mathematical model of a physical
system as a set of input, output and state variables related by first

order differential equati
ons. To
abstract from the number of inputs, outputs and states, the variables are expressed as vectors and
the differential and algebraic equations are written in matrix form (the latter only being possible
when the dynamical system is linear). The state s
pace representation (also known as the "time

domain approach") provides a convenient and compact way to model and analyze systems with
multiple inputs and outputs. With inputs and outputs, we would otherwise have to write down
Laplace transforms to encode
all the information about a system. Unlike the frequency domain
approach, the use of the state space representation is not limited to systems with linear
components and zero initial conditions. "State space" refers to the space whose axes are the state
var
iables. The state of the system can be represented as a vector within that space.
[5]
System classifications
Linear systems control
For MIMO systems, pole placement can be performe
d mathematically using a
state space
representation
of the open

loop system and calculating a feedback matrix assigning poles in the
desired positions. In c
omplicated systems this can require computer

assisted calculation
capabilities, and cannot always ensure robustness. Furthermore, all system states are not in
general measured and so observers must be included and incorporated in pole placement design.
Nonlinear systems control
Processes in industries like
robotics
and the
aerospace industry
typically
have strong nonlinear
dynamics. In control theory it is sometimes possible to linearize such classes of systems and
apply linear techniques, but in many cases it can be necessary to devise from scratch theories
permitting control of nonlinear systems. The
se, e.g.,
feedback linearization
,
backstepping
,
sliding
mode control
, trajectory linearization control normally take advantage of results based on
Lyapunov's th
eory
.
Differential geometry
has been widely used as a tool for generalizing well

known linear control concepts to the non

linear case, as well as showing the subt
leties that make
it a more challenging problem.
Decentralized systems
When the system is controlled by multiple controllers, the problem is one of decentralized
control. Decentralization is helpful in many ways, for instance, it helps control systems opera
te
over a larger geographical area. The agents in decentralized control systems can interact using
communication channels and coordinate their actions.
3)Topics in control theory
Stability
The
stability
of a general
dynamical system
with no input can be described with
Lyapunov
stability
criteria. A
linear system
that takes an input is called
bounded

input bounded

output
(BIBO) stable
if its output will stay
bounded
for any bounded input. Stability for
nonlinear
systems
that take an input is
input

to

state stability
(ISS), which combines Lyapunov stability
and a notion similar to BIBO stability
. For simplicity, the following descriptions focus on
continuous

time and discrete

time linear systems.
Mathematically, this means that for a causal linear system to be stable all of the
poles
of its
transfer function
must have negative

real values, i.e. the real part of all the poles are less than
zero. Practically speaki
ng, stability requires that the transfer function complex poles reside
in the open left half of the
complex plane
for continuous time, when the
Laplace transform
is
used to obtain the transfer function.
inside the
unit circle
for discrete time, when the
Z

transform
is used.
The difference between the two cases is simply due to the traditional method of plotting
continuous time versus discrete time transfer functions. The continuous Laplace transform is
in
Cartesian coordinates
where the
axis is the real axis and the discrete Z

transform is in
circular
coordinates
where the
axis is the real axis.
When the appropriate conditions above are satisfied a system is said to be
asymptoti
cally stable
:
the variables of an asymptotically stable control system always decrease from their initial value
and do not show permanent oscillations. Permanent oscillations occur when a pole has a real part
exactly equal to zero (in the continuous time c
ase) or a modulus equal to one (in the discrete time
case). If a simply stable system response neither decays nor grows over time, and has no
oscillations, it is
margin
ally stable
: in this case the system transfer function has non

repeated
poles at complex plane origin (i.e. their real and complex component is zero in the continuous
time case). Oscillations are present when poles with real part equal to zero have an imag
inary
part not equal to zero.
If a system in question has an
impulse response
of
then the Z

transform (see
this example
), is given by
which has a pole in
(zero
imaginary pa
rt
). This system is BIBO (asymptotically) stable
since the pole is
inside
the unit circle.
However, if the impulse response was
then the Z

transform is
which has a pole at
and is not BIBO stable since the pole has a modulus strictly greater
than one.
Numerous tools exist for the analysis of the poles of a system. These include graphical systems
like the
root locus
,
Bode plots
or the
Nyquist plots
.
Mechanical changes can make equipment (and control systems) more stable. Sailors add ballast
to improve the stability of ships. Cruise ships use
antiroll fins that extend transversely from the
side of the ship for perhaps 30 feet (10 m) and are continuously rotated about their axes to
develop forces that oppose the roll.
Controllability and observability
Controllability
and
observability
are main issues in the analysis of a system before deciding the
best control strategy to be applied, or w
hether it is even possible to control or stabilize the
system. Controllability is related to the possibility of forcing the system into a particular state by
using an appropriate control signal. If a state is not controllable, then no signal will ever be a
ble
to control the state. If a state is not controllable, but its dynamics are stable, then the state is
termed Stabilizable. Observability instead is related to the possibility of "observing", through
output measurements, the state of a system. If a state
is not observable, the controller will never
be able to determine the behaviour of an unobservable state and hence cannot use it to stabilize
the system. However, similar to the stabilizability condition above, if a state cannot be observed
it might still
be detectable.
From a geometrical point of view, looking at the states of each variable of the system to be
controlled, every "bad" state of these variables must be controllable and observable to ensure a
good behaviour in the closed

loop system. That is,
if one of the
eigenvalues
of the system is not
both controllable and observable, this part of the dynamics will remain untouched in the closed

loop system. If such an eigenvalue is
not stable, the dynamics of this eigenvalue will be present
in the closed

loop system which therefore will be unstable. Unobservable poles are not present in
the transfer function realization of a state

space representation, which is why sometimes the
latt
er is preferred in dynamical systems analysis.
Solutions to problems of uncontrollable or unobservable system include adding actuators and
sensors.
[
edit
]
Control specification
Several different control strategies have been devised in the past years. These vary from
extremely general ones (
PID
controller
), to others devoted to very particular classes of systems
(especially
robotics
or
aircraft
cruise control).
A
control problem can have several specifications. Stability, of course, is always present: the
controller must ensure that the closed

loop system is stable, regardless of the open

loop stability.
A poor choice of controller can even worsen the stability of
the open

loop system, which must
normally be avoided. Sometimes it would be desired to obtain particular dynamics in the closed
loop: i.e. that the poles have
, where
is a fixed value strictly greater than zero,
instead of simply asking that
.
Another
typical specification is the rejection of a step disturbance; including an
integrator
in the
open

loop chain (i.e. directly before the system under control) easily achieves this. Other
classes
of disturbances need different types of sub

systems to be included.
Other "classical" control theory specifications regard the time

response of the closed

loop
system: these include the
rise time
(the time needed by the control system to reach the desired
value after a perturbation), peak
overshoot
(the highest value reached by the respons
e before
reaching the desired value) and others (
settling time
, quarter

decay). Frequency domain
specifications are usually related to
robustness
(see after).
Modern performance assessments use some variation of integrated tracking error
(IAE,ISA,CQI).
Model identification and robustness
A control system must always have some robustness property. A
robust controller
is such that its
properties do not change much if applied to a system slightly different from the mathematical
one used for its synthesis. This specification is
important: no real physical system truly behaves
like the series of differential equations used to represent it mathematically. Typically a simpler
mathematical model is chosen in order to simplify calculations, otherwise the true system
dynamics can be so
complicated that a complete model is impossible.
System identification
The process of determining the equations that govern the model's dynamics is called
system
identification
. This can be done off

line: for example, executing a series of measures from which
to calculate an approximated mathematical model, typically its
transfer
function
or matrix. Such
identification from the output, however, cannot take account of unobservable dynamics.
Sometimes the model is built directly starting from known physical equations: for example, in
the case of a
mass

spring

damper
system we know that
. Even
assuming that a "complete" model is used in designing the controller, all the parameters included
in these equations (called "nom
inal parameters") are never known with absolute precision; the
control system will have to behave correctly even when connected to physical system with true
parameter values away from nominal.
Some advanced control techniques include an "on

line" identific
ation process (see later). The
parameters of the model are calculated ("identified") while the controller itself is running: in this
way, if a drastic variation of the parameters ensues (for example, if the robot's arm releases a
weight), the controller wi
ll adjust itself consequently in order to ensure the correct performance.
Analysis
Analysis of the robustness of a SISO (single input single output) control system can be
performed in the frequency domain, considering the system's transfer function and usi
ng
Nyquist
and
Bode diagrams
. Topics include
gain and phase margin
and amplitude margin. For MIMO
(multi input multi output) and, in general, more complicated control systems one must consider
the th
eoretical results devised for each control technique (see next section): i.e., if particular
robustness qualities are needed, the engineer must shift his attention to a control technique by
including them in its properties.
Constraints
A particular robustn
ess issue is the requirement for a control system to perform properly in the
presence of input and state constraints. In the physical world every signal is limited. It could
happen that a controller will send control signals that cannot be followed by the
physical system:
for example, trying to rotate a valve at excessive speed. This can produce undesired behavior of
the closed

loop system, or even damage or break actuators or other subsystems. Specific control
techniques are available to solve the problem:
model predictive control
(see later), and
anti

wind
up systems
. The latter consists of an additional control block that ensures that the control signal
never exceeds a given threshold.
4)
Models
Neural network models in artificial intelligence are usually referred to as artificial neural
networks (ANNs); these are essentially simple mathematical models defining a function
or a distribution over
or both
and
, but sometimes models are also intim
ately associated
with a particular learning algorithm or learning rule. A common use of the phrase ANN model
really means the definition of a
class
of such functions (where members of the class are obtained
by varying parameters, connection weights, or spe
cifics of the architecture such as the number of
neurons or their connectivity).
[
edit
]
Network function
See also:
Graphical models
The word
network
in the term 'artificial neural network' refers to the inter
–
connections between
the neurons in the different layers of each system. An example system has three layers. The first
layer has input neurons, which send data via synapses to the second layer of
neurons, and then
via more synapses to the third layer of output neurons. More complex systems will have more
layers of neurons with some having increased layers of input neurons and output neurons. The
synapses store parameters called "weights" that manip
ulate the data in the calculations.
An ANN is typically defined by three types of parameters:
1.
The interconnection pattern between different layers of neurons
2.
The learning process for updating the weights of the interconnections
3.
The activation function that
converts a neuron's weighted input to its output activation.
Mathematically, a neuron's network function
is defined as a composition of other functions
, which can further be defined as a composition of other functions. This can be conveniently
represen
ted as a network structure, with arrows depicting the dependencies between variables. A
widely used type of composition is the
nonlinear weighted sum
, where
,
where
(commonly referred to as the
activation function
[1]
) is some predefined function, such as
the
hyperbolic tangent
. It will be convenient for the following to refer to a collection of functions
as simply a vector
.
ANN dependency graph
This figure depicts such a decomposition of
, with dependencies between variables indicated by
arrows.
These can be interpreted in two ways.
The first view is the functional view: the input
is transformed into a 3

dimensional vector
,
which is then transformed into a 2

dimensional vector
, which is finally transformed into
.
This view is most commonly e
ncountered in the context of
optimization
.
The second view is the probabilistic view: the
random variable
depends upon the
random variable
, which depends upon
, which depends upon the random
variable
. This view is most commonly encountered in the context of
graphical models
.
The two views are largely equivalent. In either case, for this particular network architecture, the
components of individual layers are independent of each other (e.g., the components of
are
independent of each other
given their input
). This naturally enables a degree of parallelism in
the implementation.
Two separate depictions of the recurrent ANN dependency graph
Networks such as the previous one are commonly called
feedforward
, because their graph is a
directed acyclic gra
ph
. Networks with
cycles
are commonly called
recurrent
. Such n
etworks are
commonly depicted in the manner shown at the top of the figure, where
is shown as being
dependent upon itself. However, an implied temporal dependence is not shown.
Learning
What has attracted the most interest in neural networks is the possib
ility of
learning
. Given a
specific
task
to solve, and a
class
of functions
, learning means using a set of
observations
to
find
which solves the task in some
optimal
sense.
This entails defining a cost function
such that, for the optimal solution
,

i.e., no solution has a cost less than the cost of the optimal solution (see
Mathematical
optimization
).
The cost function
is an important concept in l
earning, as it is a measure of how far away a
particular solution is from an optimal solution to the problem to be solved. Learning algorithms
search through the solution space to find a function that has the smallest possible cost.
For applications where
the solution is dependent on some data, the cost must necessarily be a
function of the observations
, otherwise we would not be modelling anything related to the data. It
is frequently defined as a
statistic
to which only approximations can be made. As a simple
example, consider the problem of finding the model
, which minimizes
, for
data pairs
drawn from some distribution
. In practical situations we would only have
samples from
and thus, for the above example, we would only minimize
. Thus, the cost is minimized over a sample of the data rather than the
entire data set.
When
some form of
online machine learning
must be used, where the cost is partially
minimized as each new example is seen. While online machine learning is often used when
is
fi
xed, it is most useful in the case where the distribution changes slowly over time. In neural
network methods, some form of online machine learning is frequently used for finite datasets.
Choosing a cost function
While it is possible to define some arbitra
ry,
ad hoc
cost function, frequently a particular cost
will be used, either because it has desirable properties (such as
convexity
) or because it arises
naturally from a particular formulation of the problem (e.g., in a probabilistic formulation the
posterior probability of the model can be used as an inverse cost). Ultimately, the cost function
will depend on the
desired task. An overview of the three main categories of learning tasks is
provided below.
Learning paradigms
There are three major learning paradigms, each corresponding to a particular abstract learning
task. These are
supervised learning
,
unsupervised learning
and
reinforcement learning
.
[
edit
]
Supervised learning
In
supervised learning
, we are given a set of example pairs
and the aim is to find a
function
in the allowed class of functions that matches the examples. In other words, we
wish to
infe
r
the mapping implied by the data; the cost function is related to the mismatch
between our mapping and the data and it implicitly contains prior knowledge about the problem
domain.
A commonly used cost is the
mean

squared error
, which tries to minimize the average squared
error between the network's output, f(x), and the target value y over all the example pairs. When
one trie
s to minimize this cost using
gradient descent
for the class of neural networks called
multilayer perceptrons
, one obtains the common and well

known
backpropagation algorithm
for
training neural networks.
Tasks that fall within the paradigm of supervis
ed learning are
pattern recognition
(also known as
classification) and
regre
ssion
(also known as function approximation). The supervised learning
paradigm is also applicable to sequential data (e.g., for speech and gesture recognition). This can
be thought of as learning with a "teacher," in the form of a function that provides co
ntinuous
feedback on the quality of solutions obtained thus far.
Unsupervised learning
In
unsupervised learning
, some data
is given and the cost function to be m
inimized, that can be
any function of the data
and the network's output,
.
The cost function is dependent on the task (what we are trying to model) and our
a priori
assumptions (the implicit properties of our model, its parameters and the observed variab
les).
As a trivial example, consider the model
, where
is a constant and the cost
. Minimizing this cost will give us a value of
that is equal to the mean of the
data. The cost function can be much more complicated. Its form depends on the application:
for
example, in compression it could be related to the
mutual information
between
and
,
whereas in statistical modeling, it could be related to the
posterior probability
of the model given
the data. (Note that in both of those examples those quantities would be maximized rather than
minimized).
Tasks that fall within the para
digm of unsupervised learning are in general
estimation
problems;
the applications include
clustering
, the estimation of
statistical distributions
,
compression
a
nd
filtering
.
Reinforcement learning
In
reinforcement learning
, data
are usually not given, but generated by an agent's interactions
with the environment. At each point in time
, the agent performs an action
and the
environment generates an observation
and an instantaneous cost
, according to some
(usually unkno
wn) dynamics. The aim is to discover a
policy
for selecting actions that minimizes
some measure of a long

term cost; i.e., the expected cumulative cost. The environment's
dynamics and the long

term cost for each policy are usually unknown, but can be estim
ated.
More formally, the environment is modeled as a
Markov decision process
(MDP) with states
and actions
with the following probability distributions: the
instantaneous
cost distribution
, the observation distribution
and the transition
,
while a policy is defined as conditional distribution over actions given the observations. Taken
together, the two define a
Markov chain
(MC). The aim is to discover the policy that minimizes
the cost; i.e., the MC for which the cost is minimal.
ANNs are frequently used in reinforcement learning as part of the overall algorithm.
Dynamic
programming
has been coupled with ANNs (Neuro dynamic programming) by
Bertsekas
and
Tsitsiklis
[2]
and applied to multi

dimensional nonlinear problems such as those involved in
vehicle routing or natural resources management because of the ability of ANNs to mitig
ate
losses of accuracy even when reducing the discretization grid density for numerically
approximating the solution of the original control problems.
Tasks that fall within the paradigm of reinforcement learning are control problems,
games
and
other
sequential decision making
tasks.
Learni
ng algorithms
Training a neural network model essentially means selecting one model from the set of allowed
models (or, in a
Bayesian
framework, determining a distr
ibution over the set of allowed models)
that minimizes the cost criterion. There are numerous algorithms available for training neural
network models; most of them can be viewed as a straightforward application of
optimization
theory and
statistical estimation
.
Most of the algorithms used in training artificial n
eural networks employ some form of
gradient
descent
. This is done by simply taking the derivative of the cost function with respect to the
network parameters and then chang
ing those parameters in a
gradient

related
direction.
Evolutionary methods
,
[3]
simulated annealing
,
[4]
expectation

maximization
,
non

parametric
methods
and
particle swarm optimization
[5]
are some commonly used metho
ds for training neural
networks.
Comments 0
Log in to post a comment