2) Neural networks and artificial intelligence

foulchilianΤεχνίτη Νοημοσύνη και Ρομποτική

20 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

91 εμφανίσεις

NEURAL NETWORKS


5

MARKS

1)The term
neural network

was traditionally used to refer to a network or circuit of
biological
neurons
.
[1]

The modern usage of the term often refers to
artificial neural networks
, wh
ich are
composed of
artificial neurons

or nodes. Thus the term has two distinct usages:

1.

Biological neural networks

are made up of real biological neurons that are connected or
functionally related in a
nervous system
. In the field of
neuroscience
, they are often
identified as groups of neurons that perform a specific physiological function in
laboratory analysis.

2.

Artificial neural networks

are composed of interconnecting artificial neurons
(programming constructs that mimic the properties of biological neurons). Artificial
neural networks may either be used to gain an und
erstanding of biological neural
networks, or for solving artificial intelligence problems without necessarily creating a
model of a real biological system. The real, biological nervous system is highly complex:
artificial neural network algorithms attempt
to abstract this complexity and focus on what
may hypothetically matter most from an information processing point of view. Good
performance (e.g. as measured by good predictive ability, low generalization error), or
performance mimicking animal or human er
ror patterns, can then be used as one source
of evidence towards supporting the hypothesis that the abstraction really captured
something important from the point of view of information processing in the brain.
Another incentive for these abstractions is t
o reduce the amount of computation required
to simulate artificial neural networks, so as to allow one to experiment with larger
networks and train them on larger data sets.

2) Neural networks and artificial intelligence

A
neural network

(NN), in the case of artificial neurons called
artificial neural network

(ANN)
or
simulated neural network

(SNN), is an interconnected group of natural or
artificial neu
rons

that uses a
mathematical or computational model

for
information pro
cessing

based on a
connectionistic

approach to
computation
. In most cases an ANN is an
adaptive system

that
changes its structure based on external or internal information that flows through the network.

In more practical terms neural networks are
non
-
linear

statistical

data modeling

or
decision
making

tools. They can be used to model complex relationships between inputs and outputs or to
fi
nd patterns

in data.

However, the paradigm of neural networks
-

i.e.,
implicit
, not
explicit

, learning is stressed
-

seems more to correspond to some kind of
natural intelligence

than to the traditional symbol
-
based
Artificial Intelligence
, which would st
ress, instead, rule
-
based learning.

Background

An
artificial neural network

involves a network of simple processing elements (
artificial
neurons
) which can exhibit complex global behavior, determined by the connections between the
processing elements and element parameters. Artificial neurons were

first proposed in 1943 by
Warren McCulloch
, a neurophysiologist, and
Walter Pit
ts
, a logician, who first collaborated at
the
University of Chicago
.
[19]

One classic
al type of artificial neural network is the recurrent
Hopfield net
.

In a neural network model simple
nodes

(which can be called by a number of names, including
"neurons", "neurodes", "Processing Elements" (PE) and "units"), are connected together to form
a network of nodes


hence the term "neural network". While a neural network does

not have to
be adaptive
per se
, its practical use comes with algorithms designed to alter the strength
(weights) of the connections in the network to produce a desired signal flow.

In modern
software implementations

of artificial neural networks the approach inspired by
biology has more or less been abandoned for a more practical approach based on statistics and
signal processing. In some of these systems, neu
ral networks, or parts of neural networks (such
as
artificial neurons
), are used as components in larger systems that combine both adaptive and
non
-
adaptive elements.

The

concept of a neural network appears to have first been proposed by
Alan Turing

in his 1948
paper "Intelligent Machinery".

Applications of natural and of artificial neural networks

T
he utility of artificial neural network models lies in the fact that they can be used to infer a
function from observations and also to use it. Unsupervised neural networks can also be used to
learn representations of the input that capture the salient cha
racteristics of the input distribution,
e.g., see the
Boltzmann machine

(1983), and more recently,
deep learning

algorithms, which can
implicitly learn the distribution function of the observed data. Learning in neural networks is
particularly useful in applications where the complexity of the data or task makes the design of
such functions by ha
nd impractical.

The tasks to which artificial neural networks are applied tend to fall within the following broad
categories:



Function approximation
, or
regression analysis
, including
time series prediction

and modeling.



Classification
, including
pattern

and sequence recognition, nove
lty detection and sequential
decision making.



Data processing
, including filtering, clustering,
blind signal separation

and compression.

Application areas of ANNs include system identification and control (vehicle control, process
control), game
-
playing and decision making (backgammon, chess, racing), pattern recognition
(radar sy
stems, face identification, object recognition), sequence recognition (gesture, speech,
handwritten text recognition), medical diagnosis, financial applications,
data mining

(or
know
ledge discovery in databases, "KDD"), visualization and
e
-
mail spam

filtering.

3)Artificial intelligence

(
AI
) is the
intelligence

of machines and the branch of
computer science

that aims to create it. AI textbooks define the field as "the study and design of intelligent
agents"
[1]

where an
intelligent agent

is a system that perceives its envir
onment and takes actions
that maximize its chances of success.
[2]

John McCarthy
, who coined the term in 1955,
[3]

defines
it as "the science and engineering of making intelligent machines."
[4]

AI research is highly technical and speci
alized, deeply divided into subfields that often fail to
communicate with each other.
[5]

Some of the division is due to social and cultural factors:
s
ubfields have grown up around particular institutions and the work of individual researchers. AI
research is also divided by several technical issues. There are subfields which are focussed on the
solution of specific
problems
, on one of several possible
approaches
, on the use of widely
differing
tools

and towards the accomplishment of particular
applications
. The central problems
of AI include such traits as reasoning, knowledge
, planning, learning, communication, perception
and the ability to move and manipulate objects.
[6]

General intelligence (or "
strong AI
") is still
among the field's long term goals.
[7]

Currently popular approaches include
statistical methods
,
computational intelligence

and
traditional symbolic AI
. There are an enormous number of tools
used in AI, including versions of
search and
mathematical optimization
,
logic
,
methods based on
proba
bility and economics
, and many others.

The field was founded on the claim that a central property of humans, intelligence

the
sapience

of
Homo sapiens

can be so precisely described that it can be simulated by a machine.
[8]

This
raises philosophical issues about the nature of the
mind

and the ethics of creating artificial
beings, issues which have been addressed by
myth
,
fiction

and
philosophy

since antiqui
ty.
[9]

Artificial intelligence has been the subject of optimism,
[10]

but has also suffered
setbacks
[11]

and,
today, has become an essential part of the technology

industry, providing the heavy lifting for
many of the most difficult problems in computer science.
[12

20 Marks

1)

Artificial intelligence

(
AI
) is the
intelligence

of machines and the branch of
computer science

that aims to create it. AI textbooks d
efine the field as "the
study and design of intelligent agents"
[1]

where an
intelligent agent

is a system
that perceives its environment and takes actions that maximize its chances of
success

Problems

The general problem of simulating (or creating) intelligence has been broken down into a
number of specific
sub
-
problems
. These consist of particular traits or capabilities that researchers
would like an intelligent system to display. The traits described below have received the mo
st
attention.
[6]

Deduction, reasoning, problem solving

Early AI researchers developed algorithms that imitated the step
-
by
-
step reasoning that humans
use w
hen they solve puzzles or make logical deductions.
[39]

By the late 1980s and '90s, AI
research had also developed highly successful methods for dealing with
uncertain

or incomplete
information, employing concepts from
probability

and economics.
[40]

For difficult problems, most of these algorithms can require enormous computational resources


most experience a "
combinatorial explosion
": the amount of memory or computer time required
becomes astronomical when the problem goes beyond a certain size. The search for more
efficient problem
-
solving algorithms is a high prio
rity for AI research.
[41]

Human beings solve most of their problems using fast, intuitive judgments rather than the
conscious, step
-
by
-
step deduction that

early AI research was able to model.
[42]

AI has made some
progress at imitating this kind of "sub
-
symbolic" problem sol
ving:
embodied agent

approaches
emphasize the importance of
sensorimotor

skills
to higher reasoning;
neural net

research
attempts to simulate the structures inside human and animal brains that give rise to this skill;
statistical approaches to AI

mimic the probabilistic nature of the human ability to guess.

Knowledge representation

Knowledge representation
[43]

and
knowledge enginee
ring
[44]

are central to AI research. Many of
the problems machines are expected to solve will require extensive knowledge about the world.
Among th
e things that AI needs to represent are: objects, properties, categories and relations
between objects;
[45]

situations, events, sta
tes and time;
[46]

causes and effects;
[47]

knowledge about
knowledge (what we know about what other people know);
[48]

and many other, less well
researched domains. A repres
entation of "what exists" is an
ontology

(borrowing a word from
traditional philosophy), of which the most general are called
upper ontologies
.
[49]

Among the most difficult problems in knowledge representation are:

Default reasoning

and the
qualification problem

Many of the things people know take the

form of "working assumptions." For example, if a bird
comes up in conversation, people typically picture an animal that is fist sized, sings, and flies.
None of these things are true about all birds.
John McCarthy

identified this problem in 1969
[50]

as the qualification problem
: for any commonsense rule that AI researchers care to represent,
there tend to be a huge number of exceptions. Almost nothing is simply true or false in the way
that abstract logic requires. AI research has explored a number of solutions to this problem.
[51]

The breadth of
commonsense kn
owledge

The number of atomic facts that the average person knows is astronomical. Research projects
that attempt to build a complete knowledge base of
commonsense

knowledge

(e.g.,
Cyc
)
require enormous amounts of laborious
ontological engineering



they must be built,

by hand,
one complicated concept at a time.
[52]

A major goal is to have the computer understand enough
concepts to be able to learn by
reading from sources like the internet, and thus be able to add to
its own ontology.
[
citation needed
]

The subsymbolic form of some
commonsense knowledge

Much of what people know is not represented as "facts" or "statements" that they could
express verbally. For example, a chess master will avoi
d a particular chess position because it
"feels too exposed"
[53]

or an art critic can take one look at a statue and instantly realize that it is
a fake.
[54]

These are intuitions or tendencies that are represented in the brain non
-
consciously
and sub
-
symbolically.
[55]

Knowledge like this informs, supports and provides a context for
symbolic, conscious knowledge. As with the related problem of sub
-
symbolic reasoning, it is
hoped that
situated AI
,
computational intelligence
, or
statistical AI

will provide ways to
represent this kind of knowledge.
[55]


Planning

Intelligent agents must be able to set goa
ls and achieve them.
[56]

They need a way to visualize the
future (they must have a representation of the state of the world and be able to make predictions
abou
t how their actions will change it) and be able to make choices that maximize the
utility

(or
"value") of the available choices.
[57]

In classical planning problems, the agent can assume that it is the only thing acting on the world
and it can be certain what the consequence
s of its actions may be.
[58]

However, if the agent is not
the only actor, it must periodically ascertain whether the world matches its predictions and

it
must change its plan as this becomes necessary, requiring the agent to reason under
uncertainty.
[59]

Multi
-
agent planning

uses the
cooperation

and competition of many agents to achieve a given
goal.
Emergent behavior

such as this is used by
evolutionary algorithms

and
swarm
intelligence
.
[60]

Learning

Machine learning
[61]

has been central to AI research from the beginning.
[62]

In 1956, at the
original Dartmouth AI summer conference,
Ray Solomonoff

wrote a report on unsupervised
probabilistic machine learning: "An

Inductive Inference Machine".
[63]

Unsupervised learning

is
the ability to find patterns in a stream of input.
Supervised learning

includes both
classification

and numerical
regression
. Classification is used to determine what category something belongs
in, after seeing a number of e
xamples of things from several categories. Regression is the attempt
to produce a function that describes the relationship between inputs and outputs and predicts how
the outputs should change as the inputs change. In
reinforcement learning
[64]

the agent is
rewarded for good responses and punished for bad

ones. These can be analyzed in terms of
decision theory
, using concepts like
ut
ility
. The mathematical analysis of machine learning
algorithms and their performance is a branch of
theoretical computer science

known as
computational learning theory
.
[65]

Nat
ural language processing

Natural language processing
[66]

gives machines the ability to read and
understand the languages that humans speak. A sufficiently powerful natural
language processing system would enable
natural language user interfaces

and
the acquisition of knowledge directly from human
-
written sources, such as
Internet texts. Some straightforward applications of natural language processing
include
information retrieval

(or
text mining
) and
machine translation
.
[67]

A common met
hod of processing and extracting meaning from natural language is through
semantic indexing. Increases in processing speeds and the drop in the cost of data storage makes
indexing large volumes of abstractions of the users input much more efficient.

Motion

and manipulation

The field of
robotics
[68]

is closely related to AI. Intelligence is requir
ed for robots to be able to
handle such tasks as object manipulation
[69]

and
navigation
, with sub
-
problems of
localization

(knowing where you are),
mappi
ng

(learning what is around you) and
motion planning

(figuring
out how to get there).
[70]

Perception

Machine perception
[71]

is the ab
ility to use input from sensors (such as cameras, microphones,
sonar and others more exotic) to deduce aspects of the world.
Computer vision
[72]

is the ability to
analyze visual input. A few selected subproblems are
speech recognition
,
[73]

facial recognition

and
object recognition
.
[74]

Social intelligence


Affective computing is the study and de
velopment of systems and devices that can recognize,
interpret, process, and simulate human
affects
.
[76]
[77]

It is an interdisciplinary field spanning
compute
r sciences
,
psychology
, and
cognitive science
.
[78]

While the origins of the field may be
traced as far back as to early philosophical enquiries into
emotion
,
[79]

the more modern branch of
computer science originated with
Rosalind Picard
's 1995 paper
[80]

on affective computing.
[81]
[82]

A motivation for the research is the ability to simulate
empathy
. The machine should interpret
the e
motional state of humans and adapt its behaviour to them, giving an appropriate response
for those emotions.

Emotion and social skills
[83]

play two roles for an intelligent agent. First, it must be able to
predict the actions of others, by understanding their motives and emotional states. (This involves
elements of
game

theory
,
decision theory
, as well as the ability to model human emotions and the
perceptual skills to detect emotions.) Also, in an effort to facilitate
human
-
computer interaction
,
an intelligent machine might want to be able to
display

emotions
--
even if it does not actually
experience them itself
--
in order to appear sensiti
ve to the emotional dynamics of human
interaction.


Creativity

A sub
-
field of AI addresses
creativity

both theoretically (from a philosophical and psychological
perspective) and practi
cally (via specific implementations of systems that generate outputs that
can be considered creative, or systems that identify and assess creativity). Related areas of
computational research are
Artificial intuition

and
Artificial imagination
.
[
citation needed
]

General intelligence

Most researchers hope that their work will eventually be incorporated into a machine with
general

intelligence (known as
strong AI
), combining all the skills above and exceeding human
abilities at most or all of them.
[7]

A few believe that
anthropomorphic

features like
artificial
consciousness

or an
artificial brain

may be required for such a project.
[84]
[85]

Many of the problems above are considered
AI
-
complete
: to solve one problem, you must solve
them all. For example, ev
en a straightforward, specific task like
machine translation

requires that
the machine follow the author's argument (
reason
), know what is being talked about
(
k
nowledge
), and faithfully reproduce the author's intention (
social intelligence
).
Machine
translation
, therefore, is believed to be AI
-
complete: it may require
strong AI

to be done as well
as humans can do it.

2)Classical control theory

To avoid the problems of the open
-
loop controller, control theory introduces
feedback
. A closed
-
loop
controller

uses feedback to control
states

or
outputs

of a
dynamical system
. Its name comes
from the information path in the system: process inputs (e.g.,
voltage

applied to an
electric
motor
) have an effect on the process outputs (e.g., speed or torque of the motor), which is
measured with
sensors

and processed by the controller; the result (the control signal) is used as
input to the process, closing the loop.

Closed
-
loop controllers have the following advantages over
open
-
loop controllers
:



disturbance rejection (such as unmeasured friction in a motor)



guaranteed performance even with
model

uncertainties, when the model structure does not
match perfectly the real process and the model parameters are not exact



unstable

proce
sses can be stabilized



reduced sensitivity to parameter variations



improved reference tracking performance

In some systems, closed
-
loop and open
-
loop control are used simultaneously. In such systems,
the open
-
loop control is termed
feedforward

and serves to further improve reference tracking
performance.

A common closed
-
loop controller architecture is the
PID controller
.

Closed
-
loop transfer function

The output of the system
y(t)

is fed back through a sensor measurement
F

to the reference value
r(t)
. The controller
C

then takes the error
e

(difference) between the reference and the

output to
change the inputs
u

to the system under control
P
. This is shown in the figure. This kind of
controller is a closed
-
loop controller or feedback controller.

This is called a single
-
input
-
single
-
output (
SISO
) control system;
MIMO

(i.e., Multi
-
Inpu
t
-
Multi
-
Output) systems, with more than one input/output, are common. In such cases variables are
represented through
vectors

instead of simple
scalar

values. For some
distributed parameter
systems

the vectors may be inf
inite
-
dimensional

(typically functions).


If we assume the controller
C
, the plant
P
, and the sensor
F

are
linear

and
time
-
invariant

(i.e.,
elements of their
transfer function

C(s)
,
P(s)
, and
F(s)

do not depend on time), the systems above
can be analysed using the
Laplace transform

on the variables. This

gives the following relations:




Solving for
Y
(
s
) in terms of
R
(
s
) gives:


The expression
is referred to as the
closed
-
loop transfer
function

of the system. The numerator is the forward (open
-
loop) gain from
r

to
y
, and the
denominator is one plus t
he gain in going around the feedback loop, the so
-
called loop gain. If
, i.e., it has a large
norm

with each value of
s
, and if
, then
Y(s)

is approximately equal

to
R(s)

and the output closely tracks the reference input.

PID controller

The
PID controller

is probably the most
-
used feedback control design.
PID

is an acronym for
Proportional
-
Integral
-
Derivative
, referring to the three terms operating on the error signal to
produce a control signal. If
u(t)

is the control signal sent to the system,
y(t)

is the measured
output and
r(t)

is the desired output, and tracking error
, a
PID controller
has the general form


The desired closed loop dynamics is obtained by adjusting the three parameters
,
and
, often iteratively by "tuning" and without specific knowledge of a plant model. Stability
can often be ensured using only the pro
portional term. The integral term permits the rejection of
a step disturbance (often a striking specification in
process control
). The derivative term is used
to provide damp
ing or shaping of the response. PID controllers are the most well established
class of control systems: however, they cannot be used in several more complicated cases,
especially if
MIMO

systems a
re considered.

Applying Laplace transformation results in the transformed PID controller equation



with the PID controller transfer function


Modern control theory

In contrast to the frequency domain analysis of the classical control theory, modern control
theory utilizes the time
-
domain
state space

representation, a m
athematical model of a physical
system as a set of input, output and state variables related by first
-
order differential equations. To
abstract from the number of inputs, outputs and states, the variables are expressed as vectors and
the differential and a
lgebraic equations are written in matrix form (the latter only being possible
when the dynamical system is linear). The state space representation (also known as the "time
-
domain approach") provides a convenient and compact way to model and analyze systems

with
multiple inputs and outputs. With inputs and outputs, we would otherwise have to write down
Laplace transforms to encode all the information about a system. Unlike the frequency domain
approach, the use of the state space representation is not limite
d to systems with linear
components and zero initial conditions. "State space" refers to the space whose axes are the state
variables. The state of the system can be represented as a vector within that space.
[5]

System classifications

Linear systems control

For MIMO systems, pole placement can be performed mathematically using a
stat
e space
representation

of the open
-
loop system and calculating a feedback matrix assigning poles in the
desired positions. In complicated systems this can require computer
-
assisted calculation
capabilities, and cannot always ensure robustness. Furthermore,

all system states are not in
general measured and so observers must be included and incorporated in pole placement design.

Nonlinear systems control

Processes in industries like
robotics

and the
aerospace industry

typically have strong nonlinear
dynamics. In control theory it is sometimes possible to linearize such classes of systems and
apply linear t
echniques, but in many cases it can be necessary to devise from scratch theories
permitting control of nonlinear systems. These, e.g.,
feedback linearization
,
backstepping
,
sliding
mode control
, trajectory linearization control normally take advant
age of results based on
Lyapunov's theory
.
Differential geometry

has been
widely used as a tool for generalizing well
-
known linear control concepts to the non
-
linear case, as well as showing the subtleties that make
it a more challenging problem.

Decentralized systems

When the system is controlled by multiple controllers, the pr
oblem is one of decentralized
control. Decentralization is helpful in many ways, for instance, it helps control systems operate
over a larger geographical area. The agents in decentralized control systems can interact using
communication channels and coord
inate their actions.

3)Topics in control theory

Stability

The
stability

of a general
dynamical system

with no input can be described with
Lyapunov
stability

criteria. A
linear system

that takes an input is called
bounded
-
input bounded
-
output
(BIBO) stable

if its output will stay
bounded

for any bounded inp
ut. Stability for
nonlinear
systems

that take an input is
input
-
to
-
state stability

(ISS), which combines Lyapunov stability
and a notion similar to BIBO stability. For simplicity, the following descriptions focus on
continuous
-
time and discrete
-
time linear systems.

Mathe
matically, this means that for a causal linear system to be stable all of the
poles

of its
transfer function

must have negative
-
real values, i.e. the real part of all the poles are less than
zero. Practically speaking, stability requires that the transfer function complex poles reside



in the open left half of the
complex plane

for continuous time, when the
Laplace transform

is
used to obtain the transfer function
.



inside the
unit circle

for discrete time, when the
Z
-
transform

is used.

The difference between the two cas
es is simply due to the traditional method of plotting
continuous time versus discrete time transfer functions. The continuous Laplace transform is in
Cartesian c
oordinates

where the
axis is the real axis and the discrete Z
-
transform is in
circular
coordinates

where the
axis is the real axis.


When the appropriate conditio
ns above are satisfied a system is said to be
asymptotically stable
:
the variables of an asymptotically stable control system always decrease from their initial val
ue
and do not show permanent oscillations. Permanent oscillations occur when a pole has a real part
exactly equal to zero (in the continuous time case) or a modulus equal to one (in the discrete time
case). If a simply stable system response neither decays

nor grows over time, and has no
oscillations, it is
marginally stable
: in this case the system transfer function has non
-
repeated
poles at complex plane origin (i.e. t
heir real and complex component is zero in the continuous
time case). Oscillations are present when poles with real part equal to zero have an imaginary
part not equal to zero.


If a system in question has an
impulse response

of


then the Z
-
transform (see
this example
), is given by


which has a pole in
(zero
imaginary part
). This system is BIBO (asymptotically) stable
since the pole is
inside

the unit circle.

However, if the impulse response was


then the Z
-
transform is


which
has a pole at
and is not BIBO stable since the pole has a modulus strictly greater
than one.

Numerous tools exist for the analysis of the poles of a system. These include graphical systems
like the
root locus
,
Bode plots

or the
Nyquist plots
.

Mechanical changes can make equipment (and control

systems) more stable. Sailors add ballast
to improve the stability of ships. Cruise ships use antiroll fins that extend transversely from the
side of the ship for perhaps 30 feet (10 m) and are continuously rotated about their axes to
develop forces that
oppose the roll.

Controllability and observability

Controllability

and
observability

are main is
sues in the analysis of a system before deciding the
best control strategy to be applied, or whether it is even possible to control or stabilize the
system. Controllability is related to the possibility of forcing the system into a particular state by
usin
g an appropriate control signal. If a state is not controllable, then no signal will ever be able
to control the state. If a state is not controllable, but its dynamics are stable, then the state is
termed Stabilizable. Observability instead is related to
the possibility of "observing", through
output measurements, the state of a system. If a state is not observable, the controller will never
be able to determine the behaviour of an unobservable state and hence cannot use it to stabilize
the system. However
, similar to the stabilizability condition above, if a state cannot be observed
it might still be detectable.

From a geometrical point of view, looking at the states of each variable of the system to be
controlled, every "bad" state of these variables must

be controllable and observable to ensure a
good behaviour in the closed
-
loop system. That is, if one of the
eigenvalues

of the system is not
both controllable and observable, this p
art of the dynamics will remain untouched in the closed
-
loop system. If such an eigenvalue is not stable, the dynamics of this eigenvalue will be present
in the closed
-
loop system which therefore will be unstable. Unobservable poles are not present in
the
transfer function realization of a state
-
space representation, which is why sometimes the
latter is preferred in dynamical systems analysis.

Solutions to problems of uncontrollable or unobservable system include adding actuators and
sensors.

[
edit
]

Control specification

Several different control strategies have been devised in the past years. These vary from
extremely gene
ral ones (
PID controller
), to others devoted to very particular classes of systems
(especially
robotics

or
aircraft

cruise control).

A control problem can have several specifications. Stability, of course, is always present: the
controller must ensure that the closed
-
loop system is stable, regar
dless of the open
-
loop stability.
A poor choice of controller can even worsen the stability of the open
-
loop system, which must
normally be avoided. Sometimes it would be desired to obtain particular dynamics in the closed
loop: i.e. that the poles have
,

where
is a fixed value strictly greater than zero,
instead of simply asking that
.

Another typical specification is the rejection of a step disturbance; including an
integrator

in t
he
open
-
loop chain (i.e. directly before the system under control) easily achieves this. Other classes
of disturbances need different types of sub
-
systems to be included.

Other "classical" control theory specifications regard the time
-
response of the close
d
-
loop
system: these include the
rise time

(the time needed by the control system to reach the desired
value after a perturbation), peak
overshoot

(the highest value reached by the response before
reaching the desired value) and others (
settling time
, quarter
-
decay). Freq
uency domain
specifications are usually related to
robustness

(see after).

Modern performance assessments use some variation of integrated tracking error
(IAE,ISA,CQI).

Model i
dentification and robustness

A control system must always have some robustness property. A
robust controller

is such that its
properties do not change much if applied to a syst
em slightly different from the mathematical
one used for its synthesis. This specification is important: no real physical system truly behaves
like the series of differential equations used to represent it mathematically. Typically a simpler
mathematical m
odel is chosen in order to simplify calculations, otherwise the true system
dynamics can be so complicated that a complete model is impossible.

System identification

The process of determining the equations that govern the model's dynamics is called
system
identification
. This can be done off
-
line: for example, executing a series of measures from which
to calculate an approximated mathematical model, typically its
transfer function

or matrix. Such
identification from the output, however, cannot take account of unobservable dynamics.
Sometimes the model is built directly starting fr
om known physical equations: for example, in
the case of a
mass
-
spring
-
damper

system we know that
. Even
assuming that a "complete" model is

used in designing the controller, all the parameters included
in these equations (called "nominal parameters") are never known with absolute precision; the
control system will have to behave correctly even when connected to physical system with true
param
eter values away from nominal.

Some advanced control techniques include an "on
-
line" identification process (see later). The
parameters of the model are calculated ("identified") while the controller itself is running: in this
way, if a drastic variation o
f the parameters ensues (for example, if the robot's arm releases a
weight), the controller will adjust itself consequently in order to ensure the correct performance.

Analysis

Analysis of the robustness of a SISO (single input single output) control system can be
performed in the frequency domain, considering the system's transfer function and using
Nyquist

and
Bode diagrams
. Topics include
gain and phase margin

and amplitude margin. For MIMO
(multi input multi output) and, in general, more complicated control systems one must consider
the theoretical results devised for each control technique (see next section): i.e., if particular
robustness qualities are neede
d, the engineer must shift his attention to a control technique by
including them in its properties.

Constraints

A particular robustness issue is the requirement for a control system to perform properly in the
presence of input and state constraints. In th
e physical world every signal is limited. It could
happen that a controller will send control signals that cannot be followed by the physical system:
for example, trying to rotate a valve at excessive speed. This can produce undesired behavior of
the close
d
-
loop system, or even damage or break actuators or other subsystems. Specific control
techniques are available to solve the problem:
model predictive contr
ol

(see later), and
anti
-
wind
up systems
. The latter consists of an additional contro
l block that ensures that the control signal
never exceeds a given threshold.

4)
Models

Neural network models in artificial intelligence are usually referred to as artificial neural
networks (ANNs); these are essentially simple mathematical models defining
a function
or a distribution over
or both
and
, but sometimes models are also intimately associated
with a particular learning algorithm or learning rule. A common use of the phrase ANN model
really means the definition of a
class

of such functions (wh
ere members of the class are obtained
by varying parameters, connection weights, or specifics of the architecture such as the number of
neurons or their connectivity).

[
edit
]

Network function

See also:
Graphical models

The word
network

in the term 'artificial neural network' refers to the inter

connections between
the neurons in the different layers of each system. An example system has three layers. The first
layer has input neurons, which send data via synapses to the second layer of
neurons, and then
via more synapses to the third layer of output neurons. More complex systems will have more
layers of neurons with some having increased layers of input neurons and output neurons. The
synapses store parameters called "weights" that manip
ulate the data in the calculations.

An ANN is typically defined by three types of parameters:

1.

The interconnection pattern between different layers of neurons

2.

The learning process for updating the weights of the interconnections

3.

The activation function that

converts a neuron's weighted input to its output activation.

Mathematically, a neuron's network function
is defined as a composition of other functions
, which can further be defined as a composition of other functions. This can be conveniently
represen
ted as a network structure, with arrows depicting the dependencies between variables. A
widely used type of composition is the
nonlinear weighted sum
, where
,
where
(commonly referred to as the
activation function
[1]
) is some predefined function, such as
the
hyperbolic tangent
. It will be convenient for the following to refer to a collection of functions
as simply a vector
.



ANN dependency graph

This figure depicts such a decomposition of
, with dependencies between variables indicated by
arrows.
These can be interpreted in two ways.

The first view is the functional view: the input
is transformed into a 3
-
dimensional vector
,
which is then transformed into a 2
-
dimensional vector
, which is finally transformed into
.
This view is most commonly e
ncountered in the context of
optimization
.

The second view is the probabilistic view: the
random variable

depends upon the
random variable
, which depends upon
, which depends upon the random
variable
. This view is most commonly encountered in the context of
graphical models
.

The two views are largely equivalent. In either case, for this particular network architecture, the
components of individual layers are independent of each other (e.g., the components of
are
independent of each other
given their input
). This naturally enables a degree of parallelism in
the implementation.



Two separate depictions of the recurrent ANN dependency graph

Networks such as the previous one are commonly called
feedforward
, because their graph is a
directed acyclic gra
ph
. Networks with
cycles

are commonly called
recurrent
. Such n
etworks are
commonly depicted in the manner shown at the top of the figure, where
is shown as being
dependent upon itself. However, an implied temporal dependence is not shown.

Learning

What has attracted the most interest in neural networks is the possib
ility of
learning
. Given a
specific
task

to solve, and a
class

of functions
, learning means using a set of
observations

to
find
which solves the task in some
optimal

sense.

This entails defining a cost function
such that, for the optimal solution
,
-

i.e., no solution has a cost less than the cost of the optimal solution (see
Mathematical
optimization
).

The cost function
is an important concept in l
earning, as it is a measure of how far away a
particular solution is from an optimal solution to the problem to be solved. Learning algorithms
search through the solution space to find a function that has the smallest possible cost.

For applications where
the solution is dependent on some data, the cost must necessarily be a
function of the observations
, otherwise we would not be modelling anything related to the data. It
is frequently defined as a
statistic

to which only approximations can be made. As a simple
example, consider the problem of finding the model
, which minimizes
, for
data pairs
drawn from some distribution
. In practical situations we would only have
samples from
and thus, for the above example, we would only minimize
. Thus, the cost is minimized over a sample of the data rather than the
entire data set.

When
some form of
online machine learning

must be used, where the cost is partially
minimized as each new example is seen. While online machine learning is often used when
is
fixed, it is most useful in the case where the distribution changes slowly over time. In ne
ural
network methods, some form of online machine learning is frequently used for finite datasets.

Choosing a cost function

While it is possible to define some arbitrary,
ad hoc

cost function,

frequently a particular cost
will be used, either because it has desirable properties (such as
convexity
) or because it arises
naturally from a particular formulation of the

problem (e.g., in a probabilistic formulation the
posterior probability of the model can be used as an inverse cost). Ultimately, the cost function
will depend on the desired task. An overview of the three main categories of learning tasks is
provided bel
ow.

Learning paradigms

There are three major learning paradigms, each corresponding to a particular abstract learning
task. These are
supervised learning
,
unsupervised learning

and
reinforcement learning
.

[
edit
]

Supervised learning

In
supervised learnin
g
, we are given a set of example pairs
and the aim is to find a
function
in the allowed class of functions that matches the examples. In other words, we
wish to
infer

the mapping implied by the data; the cost function is related to the mismatch
between our mapping and the data and it implicitly contains prior knowledge about the problem
domain.

A commonly used cost is the
mean
-
squared error
, which tries to minimize the average squared
error between the network's output, f(x), and the target value y over all the example pairs. When
one tries to minimize this cost using
gradient descent

for the class of neural networks called
multilayer perceptrons
, one obtains
the common and well
-
known
backpropagation algorithm

for
training neural networks.

Tasks that fall within the paradigm of supervised learning are
pattern recognition

(also known as
classification) and
regression

(also known as function approximation).

The supervised learning
paradigm is also applicable to sequential data (e.g., for speech and gesture recognition). This can
be thought of as learning with a "teacher," in the form of a function that provides continuous
feedback on the quality of solutions

obtained thus far.

Unsupervised learning

In
unsupervised learning
, some data
is given and the cost function to be minimized, that can be
any function of the dat
a
and the network's output,
.

The cost function is dependent on the task (what we are trying to model) and our
a priori

assumptions (the implicit properties of our model, its parameters and the observed variables).

As a trivial example, consider the mode
l
, where
is a constant and the cost
. Minimizing this cost will give us a value of
that is equal to the mean of the
data. The cost function can be much more complicated. Its form depends on the application: for
example, in compression it could be rela
ted to the
mutual information

between
and
,
whereas in statistical modeling, it could be related to the
posterior probability

of the model given
the data. (Note that in both of those examples those quantities would be maximized rather than
minimized).

Tasks that fall within the paradigm of unsupervised learning are in general
estimation

problems;
the applications include
clustering
, the estimation of
statistical distributions
,
compression

and
filtering
.

Reinforcement learning

In
reinforcement learning
, data
are usually not given, b
ut generated by an agent's interactions
with the environment. At each point in time
, the agent performs an action
and the
environment generates an observation
and an instantaneous cost
, according to some
(usually unknown) dynamics. The aim is to disc
over a
policy

for selecting actions that minimizes
some measure of a long
-
term cost; i.e., the expected cumulative cost. The environment's
dynamics and the long
-
term cost for each policy are usually unknown, but can be estimated.

More formally, the environ
ment is modeled as a
Markov decision process

(MDP) with states
and actions
with the following probability distributions: the instantaneous
cost distribution

, the observation distribution
and the transition
,
while a policy is defined as conditional distribution over actions given the observations. Taken
together, the two define a
Markov chain

(MC). The aim is to discover the policy that minimizes
the cost; i.e., the MC for which the cost is minimal.

ANNs are frequently used in reinforcement learning as part of the overall algorithm.
Dynamic
programming

has been coupled with ANNs (Neuro dynamic programming) by
Bertsekas

and
Tsitsiklis
[2]

and applied to multi
-
dimensional nonlinear problems such as those involved in
vehicle routing or natural resources management because of the ability of ANNs to mitigate
losses of accura
cy even when reducing the discretization grid density for numerically
approximating the solution of the original control problems.

Tasks that fall within the paradigm of reinforcement learning are control problems,
games

and
other
sequential decision making

tasks.

Learning algorithms

Traini
ng a neural network model essentially means selecting one model from the set of allowed
models (or, in a
Bayesian

framework, determining a distribution over the set

of allowed models)
that minimizes the cost criterion. There are numerous algorithms available for training neural
network models; most of them can be viewed as a straightforward application of
optimization

theory and
statistical estimation
.

Most of the algorithms used in training artificial neural networks emplo
y some form of
gradient
descent
. This is done by simply taking the derivative of the cost function with respect to the
network parameters and then changing those parameters

in a
gradient
-
related

direction.

Evolutionary methods
,
[3]

simulated annealing
,
[4]

expectation
-
maximization
,
non
-
parametric
methods

and
particle swarm optimization
[5]

are some commonly used metho
ds for training neural
networks.