The Future of Time Series

habitualparathyroidsΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 10 μήνες)

146 εμφανίσεις

The Future of Time Series
Neil A. Gershenfeld
Andreas S. Weigend
SFI WORKING PAPER: 1993-08-053
SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent the
views of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or
proceedings volumes, but not papers that have already appeared in print. Except for papers by our external
faculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, or
funded by an SFI grant.
©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensure
timely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rights
therein are maintained by the author(s). It is understood that all persons copying this information will
adhere to the terms and constraints invoked by each author's copyright. These works may be reposted only
with the explicit permission of the copyright holder.
www.santafe.edu
SANTA FE INSTITUTE

Preface
Thisbook
is
the
result
of
an
unsuccessfuljoke.During
the
S1Ull1ller
of1990,
we
were
both
participatingin
the
ComplexSystems
S1Ull1ller
Schoolof
the
Santa
Fe
Institute.Likemanysuchprogramsdealingwith"complexity,"thisonewasfullof
excitingexamplesofhow
it
canbepossible
to
recognizewhenapparentlycomplex
behaviorhasasimpleunderstandableorigin.However,as
is
often
the
caseinyoung
disciplines,littleeffortwasspenttrying
to
understandhowsuchtechniquesare
interrelated,
how
theyrelate
to
traditionalpraciices,
and
what
the
boundsontheir
reliabilityare.Theseissuesmustbeaddressedifsuggestiveresultsare
to
growinto
amaturediscipline.Problemswereparticularlyapparentin
time
seriesanalysis,
anarea
that
we
both
arrived
at
inourrespectivephysicstheses.
Out
of
frustration
with
the
fragmented
and
anecdotalliterature,
we
made
what
we
thoughtwasa
humoroussuggestion:
run
acompetition.Much
to
oursurprise,noonelaughed
and,
to
ourfurthersurprise,
the
Santa
Fe
Institutepromptlyagreed
to
supportit.
The
restishistory(630pagesworth).
Reasonswhyacompetitionmightbea
bad
ideaabound:scienceisathought­
fulactivity,notasimplerace;
the
relevantdisciplinesare
too
dissimilarand
the
questionstoodifficult
to
permitmeaningfulcomparisons;
and
the
requiredeffort
mightbeprohibitivelylargeinreturnforpotentiallymisleadingresults.
On
the
otherhand,regardlessof
the
verydifferenttechniques
and
languagegamesof
the
differentdisciplines
that
study
timeseries(physics,biology,economics,
...),very
TimesSeriesPrediction:ForecastingtheFuture
and
UnderstandingthePast,
A.
S.
Weigendand
N.
A.
Gershenfeld,eds.Reading,MA:Addison-Wesley,1993.
ii
Time
SeriesPrediction
similarquestionsareasked:
What
willhappennext?
What
kindofsystempro­
duced
the
time
series?Howcanitbedescribed?Howmuchcan
we
knowabout
the
system?Thesequestions
can
havequantitativeanswers
that
permitdirectcom­
parisons.Andwith
the
growingpenetration
of
computernetworks,
it
hasbecome
feasible
to
announceacompetition,
to
distribute
the
data
(withholding
the
continu­
ations),
and
subsequently
to
collect
and
analyze
the
results.
We
began
to
realize
that
acompetitionmightnotbesuchacrazyidea.
The
Santa
Fe
Instituteseemedideallyplaced
to
supportsuch
an
undertaking.
It
spansmanydisciplines
and
addressesbroadquestions
that
donoteasilyfallwithin
the
purviewofasingleacademicdepartment.Followingitsinitialcommitment,
we
assembledagroupofadvisors[l]
to
representmanyof
the
relevantdisciplinesinorder
to
helpusdecide
if
and
how
to
proceed.Theseinitialdiscussionsprogressed
to
the
collectionofalargelibraryofcandidate
data
sets,
the
selectionofarepresentative
smallsubset,
the
specificationofthecompetitiontasks,andfinally
the
publicizing
and
then
runningof
the
competition(whichwasremotelymanaged
by
Andreasin
Bangkok
and
Neil
in
Cambridge,Massachusetts).Afteritsclose,
we
ran
aNATO
AdvancedResearchWorkshop
to
bringtogether
the
advisoryboard,representatives
of
the
groups
that
hadprovided
the
data,
successfulparticipants,
and
interested
observers.Thisheterogeneousgroupwasable
to
communicateusing
the
common
referenceof
the
competition
data
sets;
the
resultisthisbook.
It
aims
to
provide
asnapshot
of
the
rangeofnewtechniques
that
arecurrentlyused
to
studytime
series,bothasareference
for
expertsandasaguide
for
novices.
Scanning
the
contents,
we
arestruckby
the
variety
of
routes
that
leadpeople
to
study
time
series.Thissubject,whichhasarather
dry
reputationfromadistance
(we
certainlythought
that),
lies
at
the
heartof
the
scientificenterpriseofbuild­
ingmodelsfromobservations.One
of
ourgoalswas
to
helpclarifyhownewtime
seriestechniquescanbebroadlyapplicablebeyond
the
restricteddomainswithin
which
they
evolved(suchassimplechaosexperiments),and,
at
the
sametime,
how
theories
of
everythingcanbeapplicable
to
nothinggiven
the
limitationsofreal
data.
We
hadanotherhiddenagendainrunningthiscompetition.
Anyone
such
study
can
neverbedefinitive,
but
ourhopewas
that
the
realresultwouldbeplanting
aseedforanongoingprocessofusingnewtechnology
to
shareresultsinwhatis,
ineffect,averylargecollectiveresearchproject.
The
manypapersinthisvolume
that
use
the
competitiontasksas
starting
pointsfor
the
broaderanddeeper
study
ofthesecommon
data
setssuggests
that
ourhopemightbefulfilled.Thissurvey
of
whatispossibleis
in
nowaymeant
to
suggest
that
better
resultsareimpossible.
Wewillbepleased
if
the
Santa
Fe
data
setsandresultsbecomecommonreference
(llThe
advisorswereLeonGlass(biology).CliveGranger(economics),BillPress(astrophysics
andnumericalanalysis),MauricePriestley(statistics),ItamarProcaccia(dynamicalsystems),
T.
Subba
Rao
(statistics),andHarrySwinney(experimentalphysics).
Preface
iii
benchmarks,
and
evenmorepleased
if
they
arelaterdiscardedandreplacedby
moreworthysuccessors.
Anundertakingsuchasthisrequires
the
assistanceofmorefriends(and
thoughtfulcritics)
than
we
knew
we
had.
We
thank
the
members
of
the
advisory
board,
the
providersof
the
data
sets,
and
the
competitionentrants
for
partici­
patinginaquixoticundertakingbasedonlimitedadvanceinformation.
We
thank
the
Santa
Fe
InstituteandNATOforsupport.!2]
We
aregratefulfor
the
freedom
providedbyStanford,Harvard,Chulalongkorn,MIT,
and
Xerox
PARCo
We
thank
RondaButler-Villa
and
DellaUlibarrifor
the
heroic
job
ofhelpingusassemblethis
book,
and
we
thank
the
onehundredrefereesfortheircriticalcomments.
We
also
thank
ourfriends
for
notabandoningusdespite
the
demandsofthisenterprise.
Finally,
we
must
thank
each
other
fortolerating
and
successfullyfilteringeach
other'soccasionallyoddideasabouthow
to
run
atimeseriescompetition,which
neitherofuswouldhavebeenable
to
do
(orunderstand)alone.
July1993
NeilGershenfeld
Cambridge,MA
AndreasWeigend
SanFrancisco,
CA
[2lCorefundingfor
the
Santa
FeInstituteisprovided
by
the
JohnD.andCatherine
T.
MacArthur
Foundation,
the
NationalScienceFoundation,grantPHY-8714918,and
the
U.S.Department
of
Energy,
grant
ER-FG05-88ER25054.
iv
Time
SeriesPrediction
TABLE
OF
CONTENTS
Abstract1
1Introduction2
2
The
Competition4
3LinearTimeSeriesModels
11
3.1ARMA,
FIR,
and
all
that
11
3.2
The
BreakdownofLinearModels
16
4UnderstandingandLearning
18
4.1Understanding:State-SpaceReconstruction
20
4.2Learning:NeuralNetworks
25
5Forecasting
30
5.1State-SpaceForecasting
30
5.2ConnectionistForecasting34
5.3BeyondPoint-Predictions
36
6Characterization
42
6.1
SimpleTests
42
6.2DirectCharacterizationvia
State
Space
45
6.3IndirectCharacterization:UnderstandingthroughLearning
53
7
The
Future
60
Appendix
63
Appendix
to
the
Book:Accessing
the
Server
References
NeilA.GershenfeldtandAndreasS.
Weigend:j:
tMIT
MediaLaboratory,20AmesStreet,Cambridge,MA02139;
e-mail;neilg@media.mit.edu.
:j:XeroxPARC,3333CoyoteHillRoad,PaloAlto,CA94304;
e-mail:weigend@cs.colorado.edu.
AddressafterAugust1993:AndreasWeigend,DepartmentofComputerScience
andInstituteofCognitiveScience,UniversityofColorado,Boulder,CO80309-0430.
The
Future
of
Time
Series:
Learning
and
Understanding
Throughoutscientificreseaxch,measured
time
series
axe
the
basisforcha­
racterizing
an
observedsystemandforpredictingitsfuturebehavior.A
number
of
newtechniques(such
as
state-spacereconstructionandneural
networks)promiseinsights
that
traditionalapproaches
to
theseveryold
problemscannotprovide.
In
practice,however,
the
applicationofsuchnew
techniqueshasbeenhamperedby
the
unreliabilityoftheirresultsandby
the
difficultyofrelatingtheirperformance
to
those
of
maturealgorithms.
Thischapterreportsonacompetition
run
through
the
Santa
Fe
Institute
in
whichparticipantsfromarange
of
relevantdisciplinesapplieda
vaxi­
etyoftimeseriesanalysistools
to
asmallgroup
of
common
data
setsin
order
to
helpmakemeaningfulcompaxisonsamongtheirapproaches.The
design
and
the
results
of
thiscompetition
axe
described,
and
the
historical
and
theoreticalbackgroundsnecessaxy
to
understand
the
successfulentries
arereviewed.
N.
A.
Gershenfeldand
A.
S.
Weigend,
''TheFutureofTimeSeries:LearningandUnderstanding."In:
TimesSeriesPrediction:ForecastingtheFutureandUnderstandingthePast,
A.
S.
Weigendand
N.
A.
Gershenfeld,eds.Reading,MA:Addison-Wesley,1993.
1
2
Neil
A.
GershenfeldandAndreas
S.
Weigend
1.
INTRODUCTION
Thedesire
to
predict
the
futureandunderstand
the
past
drives
the
searchforlaws
that
explainthebehavior
of
observedphenomena;examplesrangefrom
the
irregu­
larityinaheartbeat
to
the
volatilityofacurrencyexchangerate.
If
thereareknown
underlyingdeterministicequations,inprinciplethey
can
besolved
to
forecast
the
outcomeofanexperimentbasedonknowledgeof
the
initialconditions.Tomake
aforecast
if
theequationsarenotknown,onemustfind
both
the
rulesgoverning
systemevolution
and
the
actual
state
of
the
system.
In
thischapter
we
willfocuson
phenomenaforwhichunderlyingequationsare
not
given;
the
rules
that
govern
the
evolutionmustbeinferredfromregularitiesin
the
past.Forexample,
the
motion
ofapendulumor
the
rhythmof
the
seasonscarrywithin
them
the
potentialfor
predictingtheirfuturebehaviorfromknowledge
of
theiroscillationswithoutrequir­
inginsightinto
the
underlyingmechanism.
We
willuse
the
terms"understanding"
and"learning"
to
refer
to
twocomplementaryapproachestaken
to
analyzeanun­
familiar
time
series.
Understanding
isbasedonexplicitmathematicalinsightinto
howsystemsbehave,and
learning
is
basedonalgorithms
that
canemulate
the
structureinatimeseries.
In
both
cases,
the
goalis
to
explainobservations;
we
will
notconsider
the
importantrelatedproblem
of
usingknowledgeaboutasystemfor
controlling
it
inorder
to
producesomedesiredbehavior.
Timeseriesanalysishasthreegoals:forecasting,modeling,andcharacteriza­
tion.
The
aim
of
forecasting
(alsocalled
predicting)
is
to
accuratelypredict
the
short-termevolution
of
the
system;
the
goal
of
modeling
is
to
findadescription
that
accuratelycapturesfeatures
ofthe
long-termbehavior
ofthe
system.Thesearenot
necessarilyidentical:findinggoverningequationswithproperlong-termproperties
maynotbethemostreliableway
to
determineparametersforgoodshort-term
forecasts,
and
amodel
that
is
usefulforshort-termforecastsmayhaveincorrect
long-termproperties.The
third
goal,system
characterization,
attempts
withlittle
orno
apriori
knowledge
to
determinefundamentalproperties,suchas
the
number
ofdegrees
of
freedomofasystemor
the
amount
of
randomness.Thisoverlapswith
forecasting
but
can
cliffer:
the
complexityofamodelusefulforforecasting
may
not
berelated
to
the
actualcomplexityof
the
system.
Before
the
1920s,forecastingwasdonebysimplyextrapolating
the
series
throughaglobalfit
in
the
timedomain.
The
beginningof''modern''
time
series
predictionmight
be
set
at
1927whenYuleinvented
the
autoregressivetechnique
inorder
to
predict
the
annualnumberofsunspots.
His
modelpredicted
the
next
valueasaweighted
sum
ofpreviousobservationsof
the
series.
In
order
to
obtain
"interesting"behaviorfromsuchalinearsystem,outsideinterventionin
the
formof
externalshocksmustbeassumed.For
the
half-centuryfollowingYule,
the
reigning
paradigmremained
that
oflinearmodelsdrivenbynoise.
However,therearesimplecasesforwhichthisparadigmisinadequate.For
example,asimpleiteratedmap,suchas
the
logisticequation(Eq.(11),inSection
TheFutureofTimeSeries:LearningandUnderstanding
3
3.2),cangenerateabroadbandpowerspectrum
that
cannotbeobtainedbya
linearapproximation.
The
realization
that
apparentlycomplicatedtimeseriescan
begeneratedbyverysimpleequationspointed
to
the
needforamoregeneral
theoreticalframeworkfortimeseriesanalysis
and
prediction.
Twocrucialdevelopmentsoccurredaround1980;
both
wereenabledby
the
generalavailability
of
powerfulcomputers
that
permitted
muchlongertimeseries
to
berecorded,morecomplexalgorithms
to
be
applied
to
them,and
the
data
and
the
results
of
thesealgorithms
to
beinteractivelyvisualized.Thefirstdevelopment,
state-spacereconstructionbytime-delayembedding,drewonideasfromdifferential
topology
and
dynamicalsystems
to
provideatechniqueforrecognizingwhena
timeserieshasbeengeneratedbydeterministicgoverningequationsand,
if
so,
for
understanding
the
geometricalstructureunderlying
the
observedbehavior.The
seconddevelopmentwas
the
emergenceof
the
field
of
machinelearning,typified
byneuralnetworks,
that
canadaptivelyexplorealargespaceofpotentialmodels.
With
the
shiftinartificialintelligencefromrule-basedmethodstowardsdata-driven
methods,!ll
the
fieldwasready
to
applyitself
to
time
series,andtimeseries,
now
recorded
with
orders
of
magnitudemore
data
points
than
wereavailablepreviously,
wereready
to
beanalyzedwithmachine-learningtechniquesrequiringrelatively
large
data
sets.
The
realizationof
the
promise
of
thesetwoapproacheshasbeenhamperedby
the
lack
of
ageneralframeworkfor
the
evaluationofprogress.Becausetimeseries
problemsariseinsomanydisciplines,
and
because
it
ismucheasier
to
describean
algorithm
than
to
evaluateitsaccuracy
and
itsrelationship
to
mature
techniques,
the
literatureintheseareashasbecomefragmented
and
somewhatanecdotal.The
breadth(and
the
rangeinreliability)
of
relevantmaterialmakes
it
difficultfor
newresearch
to
buildon
the
accumulatedinsightof
past
experience(researchers
standingoneachother'stoes
rather
than
shoulders).
Globalcomputernetworks
nOw
offeramechanismfor
the
disjointcommunities
to
attackcornmonproblemsthrough
the
widespreadexchangeof
data
andinfor­
mation.
In
order
to
foster
this
process
and
to
helpclarifythecurrent
state
oftime
seriesanalysis,
we
organized
the
Santa
Fe
TimeSeriesPredictionandAnalysis
Competitionunder
the
auspices
of
the
Santa
Fe
Instituteduringthefallof1991.
Thegoal
was
not
to
pick"winners"and"losers,"butrather
to
provideastructure
forresearchersfrom
the
manyrelevantdisciplines
to
comparequantitatively
the
results
of
theiranalysesofagroupof
data
setsselected
to
span
the
rangeofstudied
problems.Toexplore
the
resultsof
the
competition,aNATOAdvancedResearch
Workshopwasheldin
the
spring
of
1992;workshopparticipantsincludedmembers
of
the
competitionadvisoryboard,representatives
of
the
groups
that
hadcollected
the
data,participantsin
the
competition,
and
interestedobservers.Although
the
participantscamefromabroadrange
of
disciplines,
the
discussionswereframedby
[llData
sets
of
hundreds
of
megabytes
areroutinelyanalyzedwithmassivelyparallelsupercom­
puters,usingparallelalgorithms
to
findnearneighborsinmultidimensionalspaces(K.Thearling,
personalcommunication,1992jBourgoin
et
al.,1993).
4
Neil
A.
GershenfeldandAndreas
S.
Weigend
the
analysisofcommon
data
sets
and
it
was(usually)possible
to
find
a.
meaningful
commonground.Inthisoverviewchapter
we
describe
the
structure
and
the
results
ofthiscompetitionandreview
the
theoreticalmaterialrequired
to
understand
the
successfulentries;muchmoredetailisavailablein
the
articlesby
the
participants
inthisvolume.
2.
THECOMPETITION
The
planning
for
thecompetitionemergedfrominfonnaldiscussions
at
the
Complex
SystemsSummerSchool
at
the
Santa
Fe
Institutein
the
summer
of
1990;
the
first
stepwas
to
assembleanadvisoryboard
to
represent
the
interests
of
many
of
the
relevant
fields.!2]
Withthehelp
of
thisgroup
we
gatheredroughly
200
megabytes
ofexperiruentaltirueseriesforpossibleusein
the
competition.
This
volumeof
data
refiects
the
growthoftechniques
that
useenonnous
data
sets(whereautomatic
collection
and
processingisessential)overtraditionaltirueseries(suchasquarterly
economicindicators,where
it
ispossible
to
develop
an
intiruaterelationshipwith
each
data
point).
In
order
to
bewidelyaccessible,
the
data
needed
to
bedistributedbyftpover
theInternet,byelectronicmail,andbyfloppydisksforpeoplewithoutnetwork
access.
The
latterdistributionchannelsliruited
the
sizeof
the
competition
data
to
a
few
megabytes;
the
final
data
setswerechosen
to
spanas
many
ofadesired
groupofattributesaspossiblegiventhissizelimitation(the
attributes
areshown
inFigure
2).
The
finalselectionwas:
A.
A
clean
physics
laboratory
experiment.
1,000
points
of
the
fluctuations
inafar-infraredlaser,approxiruatelydescribedbythreecouplednonlinearor­
dinarydifferentialequations(Hubner
et
aI.,thisvolume).
B.
Physiological
data
from
a
patient
with
sleep
apnea.
34,000
points
of
the
heartrate,chestvolume,bloodoxygenconcentration,
and
EEG
state
of
asleepingpatient.Theseobservablesinteract,
but
the
underlyingregulatory
mechanismisnotwellunderstood
(Rigneyet
aI.,thisvolume).
C.
High-frequency
currency
exchange
rate
data.
Tensegmentsof
3,000
pointseachof
the
exchangeratebetweentheSwissfranc
and
the
U.S.
dol­
lar.
The
averagetiruebetweentwoquotes
is
betweenone
and
twominutes
(Lequarre,thisvolume).
If
the
marketwasefficient,such
data
shouldbea
randomwalk.
[2lTheadvisorswereLeonGlass(biology),CliveGranger(economics),BillPress(astrophysics
andnumericalanalysis),MauricePriestley(statistics),ItamarProcaccia
(dynamical
systems).
T.
SubbaRae(statistics),and
Harry
Swinney(experimentalphysics).
TheFutureofTimeSeries:LearningandUnderstanding
5
B~
2000
.J
3000
D~I
2000
E~I
1250
I
..
]
-
-
--
.
-
.
..
-
..
F----~~:':-::·-:/:-----'--:.--:-~~~\:/
..
r~-<--\-.,.---:--f
:~-.-:.--?-~:,
-
'':~-<
..
:./;
..
:,:;fV
\/.Y"'--·
1600
FIGURE1Sections
of
thecompetitiondatasets.
6
Neil
A.
GershenfeldandAndreas
S.
Weigend
natural
EB
A
D
synthetic
stationary
A
D
B
C
nonstationary
low
dimensional
A
D
C
stochastic
clean
A
B
E
noisy
short
A
C
D
long
documented
BE
AC
FD
blind
linear
E
AD
nonlinear
scalar
A
B
F
vector
one
trial
AC
many
trials
cE
discontinuous
continuous
D
switching
A
catastrophes
B
episodes
can
dynamics
make
money?
C
can
dynamics
save
lives?
B
FIGURE
2Someattributesspanned
by
thedatasets.
D.A
numerically
generated
series
designed
for
this
competition.
Adriven
particleinafour-dimensionalnonlinearmultiple-wellpotential(ninedegrees
of
freedom)withasmallnonstationaritydriftin
the
welldepths.(Detailsare
givenin
the
Appendix.)
E.
Astrophysical
data
from
a
variable
star.
27,704pointsin
17
segments
of
the
timevariationoftheintensity
of
avariablewhitedwarf
star,
collectedby
the
WholeEarthTelescope
(Clemens,thisvolume).
The
intensityvariationarises
fromasuperpositionofrelativelyindependentsphericalharmonicmultiplets,
and
there
is
significantobservationalnoise.
F.A
fugue.
J.
S.
Bach'sfinal(unfinished)fuguefrom
The
Art
of
theFugue,
added
after
the
closeof
the
formalcompetition(Dirst
and
Weigend,thisvolume).
The
amountofinformationavailable
to
the
entrants
about
the
originofeach
data
set
variedfromextensive(DataSetsB
and
E)
to
blind
(Data
SetD).
The
original
files
willremainavailable.
The
data
setsaregraphedinFigure
1,
andsome
of
the
characteristicsaresummarizedinFigure
2.
The
appropriatelevel
of
descrip­
tionformodels
of
these
data
rangesfromlow-dimensional
stationary
dynamics
to
stochasticprocesses.
TheFuture
of
TimeSeries:LearningandUnderstanding
7
Afterselecting
the
data
sets,
we
nextchose
competition
tasks
appropriate
to
the
data
sets
and
researchinterests.
The
participantswereaskedto:
predict
the
(withheld)continuations
ofthe
data
setswithrespect
to
givenerror
measures,
characterize
the
systems(includingaspectssuchas
the
mnnberofdegrees
offreedom,predictability,noisecharacteristics,and
the
nonlinearity
of
the
system),
inferamodelof
the
governingequations,
and
describe
the
algorithmsemployed.
The
data
sets
and
competitiontasksweremadepubliclyavailableon
August
1,
1991,
and
competitionentrieswereaccepteduntilJanuary15,1992.
Participantswererequired
to
describetheiralgorithms.(Insightinsomeprevi­
ouscompetitionswashamperedby
the
acceptanceofproprietarytechniques.)One
interesting
trend
in
the
entrieswas
the
focusonprediction,forwhichthreemotiva­
tionsweregiven:(i)becausepredictionsarefalsifiable,insightintoamodelusedfor
predictionisverifiable;(ii)thereareavariety
of
financialincentives
to
study
pre­
diction;
and
(iii)
the
growthofinterestinmachinelearningbringswith
it
the
hope
that
there
can
beuniversally
and
easilyapplicablealgorithms
that
can
beused
to
generateforecasts.Another
trend
was
the
generalfailureofsiroplistic"black­
box"
approaches-in
allsuccessfulentries,exploratory
data
analysispreceded
the
algorithmapplicationJ3J
It
isinteresting
to
comparethistiroeseriescompetition
to
the
previous
state
of
the
art
as
reflectedintwoearliercompetitions(Makridakis&Hibon,1979;
Makridakis
et
a!.,1984).
In
these,averylargenumberoftiroeserieswasprovided
(111
and
1001,respectively),takenfrombusiness(forecastingsales),economics
(predictingrecoveryfrom
the
recession),finance,and
the
socialsciences.However,
allof
the
seriesusedwereveryshort,generallyless
than
100valueslong.Mostof
the
algorithmsenteredwerefullyautomated,
and
most
of
the
discussioncentered
aroundlinearmodels.
i']
In
the
Santa
Fe
Competition
all
of
the
successfulentries
werefundamentallynonlinearand,eventhoughsignificantlymorecomputerpower
wasused
to
analyze
the
larger
data
setswithmorecomplexmodels,
the
application
of
the
algorithmsrequiredmorecarefulmanualcontrol
than
in
the
past.
[31
The
data.,
analysisprograms,andsummaries
of
the
resultsareavailablebyanonymousftpfrom
ftp.santafe
.edu,
as
described.
in
the
Appendix
to
this
volume.
In
the
competition
period,
on
average
5
to
10
people
retrieved
the
data
per
day,
and
30
groups
submitted
final
entries
by
the
deadline.Entries
came
from
the
U.S.,Europe(including£onnercommunistcountries),andAsia,
rangingfromjuniorgraduate
students
to
seniorresearchers.
[4]Thesediscussionsfocusedonissuessuch
as
the
order
of
thelinearmodel.Chatfield(1988)
summarizesprevious
competitions.
8
Neil
A.
Gershenfeld
and
Andreas
S.
Weigend
250
x(t)
200
I-
Sauer
-
1100
t
1080
c
competition
entry
{,error
bars
i:
i
i..."...
true
continua~on
::
1060
,
10201040
p
~\
i\
J
1
,
:
.
~
~
f
~~
g
I:
~
l
;
T
~
i
I
j
;
~
¢
.~.
;
150
100
-
1100
t
1080
Wan
1060
H
1040
(!.;

I
j'
:,
i:
,.
;\
i
!
i.
i~
.~,
m
I
l
j
\
~
¢
$
t
"i,
1020
~
.~
¢
;\
¢i
H
ii
:~
.!
~
i
.
!
j
~
¢
¥
50
250
x(t)
200
FIGURE3ThetwobestpredictedcontinuationsforDataSet
A,
bySauerand
by
Wan.
Predictedvaluesareindicated
by
"c."predictederrorbars
by
verticallines.The
truecontinuation(notavailableatthetime
when
thepredictionswerereceived)
is
shown
in
grey(thepoints
are
connected
to
guidetheeye).
TheFutureof
lime
Series:LearningandUnderstanding
250
x(t)
200
150
9
100
50
o
-50
1000
Sauer
1100
1200
1300
1400
prediction
true
continuation
1500
t
1600
250
x(t)
200
150
100
50
o
-50
10001100
1200
1300
1400
1500
t
1600
FIGURE4Predictionsobtained
by
thesametwomodelsas
in
thepreviousfigure,but
continued500pointsfurtherintothefuture,Thesolidlineconnectsthepredictedpoints;
thegreylineIndicatesthetruecontinuation.
10
Neil
A.
GershenfeldandAndreas
S.
Weigend
Asanexampleof
the
results,consider
the
intensityof
the
laser
(Data
Set
A;
seeFigure1).
On
the
onehand,thelasercanbedescribedbyarelativelysimple
"correct"model
of
threenonlineardifferentialequations,
the
sameequations
that
Lorenz(1963)used
to
approximateweatherphenomena.
On
the
otherhand,since
the
1,00o-pointtrainingsetshowedonly
three
of
fourcollapses,
it
isdifficult
to
predict
the
nextcollapsebasedonso
few
instances.
Forthis
data
set
we
askedforpredictionsof
the
next100pointsaswellas
estimates
of
the
errorbarsassociatedwiththesepredictions.
We
usedtwomeasures
to
evaluate
the
submissions.
The
firstmeasure(normalizedmeansquarederror)was
basedon
the
predictedvaluesonly;
the
secondmeasureused
the
submittederror
predictions
to
computethelikelihoodof
the
observed
data
given
the
predictions.
The
Appendix
to
thischaptergives
the
definitions
and
explanationsof
the
error
measuresaswellasatableofallentriesreceived.
We
wouldlike
to
point
out
a
few
interestingfeatures.Althoughthissingle
trial
doesnotpermitfinedistinctions
to
be
madebetweentechniqueswithcomparableperformance,twotechniquesclearlydid
much
better
than
theothersfor
Data
Set
A;
oneusedstate-spacereconstruction
to
buildanexplicitmodelfor
the
dynamics
and
the
otherusedaconnectionist
network(alsocalledaneuralnetwork).Incidentally,apredictionbasedsolelyon
visuallyexaminingandextrapolating
the
training
data
didmuchworse
than
the
besttechniques,
but
alsomuch
better
than
the
worst.
Figure3shows
the
twobestpredictions.Sauer(thisvolume)
attempts
to
un­
derstand
and
developa
representation
for
the
geometryin
the
system's
state
space,
whichis
the
best
that
canbedonewithoutknowingsomethingabout
the
system's
governingequations,whileWan(thisvolume)addresses
the
issueof
functionap­
proximation
byusingaconnectionistnetwork
to
learn
to
emulate
the
input-output
behavior.Bothmethodsgeneratedremarkablyaccuratepredictionsfor
the
spec­
ifiedtask.
In
termsof
the
measuresdefinedfor
the
competition,Wan'ssquared
errorsareone-thirdasaslargeasSauer's,
and-taking
the
predicteduncertainty
into
account-Wan's
modelisfourtimesmorelikely
than
Sauer's'!']According
to
the
competitionscoresfor
Data
SetA,
this
puts
Wan'snetworkin
the
firstplace.
Adifferentpicture,whichcautions
the
hurriedresearcheragainstdeclaring
onemethod
to
beuniversallysuperior
to
another,emergeswhenoneexamines
the
evolution
of
thesetwopredictionmethodsfurtherinto
the
future.Figure4shows
the
sametwopredictors,
but
now
the
continuationsextend500pointsbeyond
the
100pointssubmittedfor
the
competition
entry
(noerrorestimatesareshown).!.1
The
neuralnetwork'sclassofpotentialbehaviorismuchbroader
than
whatcan
begeneratedfromasmallset
of
coupledordinarydifferentialequations,
but
the
state-spacemodel
is
able
to
reliablyforecast
the
data
muchfurtherbecauseits
explicitdescriptioncancorrectlycapture
the
characterof
the
long-termdynamics.
[5JThe
likelihoodratiocanbeobtainedfromTable2in
the
Appendixas
exp(
-3.5)/
exp(-4.8).
[6)
Furthermore,
we
invite
the
reader
to
compare
Figure5
by
Sauer
(this
volume,
p.
191)
with
Figure
13
by
Wan
(this
volmne,
p.
213).
Both
entrantsstart
the
competitionmodel
at
the
same
four(new)
differentpoints.
The
squarederrorsarecomparedin
the
Tableonp.192
of
thisbook.
TheFuture
of
TimeSeries:LearningandUnderstanding
11
In
order
to
understand
the
detailsoftheseapproaches,
we
willdetour
to
review
the
frameworkfor(andthenthefailureof)lineartimeseriesanalysis.
3.LINEARTIMESERIESMODELS
Lineartimeseriesmodelshavetwoparticularlydesirablefeatures:
they
can
beun­
derstoodingreatdetail
and
theyarestraightforward
to
implement.
The
penalty
forthisconvenienceis
that
theymaybeentirelyinappropriateforevenmoderately
complicatedsystems.
In
thissection
we
willreviewtheirbasicfeatures
and
then
considerwhyandhowsuchmodelsfail.
The
literatureonlinear
time
seriesanalysis
isvast;agoodintroductionis
the
veryreadablebookbyChatfield(1989),many
derivationscanbefound(andunderstood)in
the
comprehensivetextbyPriest­
ley(1981),
and
aclassicreference
is
Box
andJenkins'book(1976).Historically,the
generaltheory
of
linearpredictorscanbetracedback
to
Kohnogorov(1941)
and
to
Wiener(1949).
Twocrucialassumptionswillbemadeinthissection:
the
systemisassumed
to
belinear
and
stationary.
In
the
restofthischapter
we
willsayagreatdeal
about
relaxing
the
assumptionoflinearity;muchless
is
known
about
models
that
have
coefficients
that
varywithtime.Tobeprecise,unlessexplicitly
stated
(suchasfor
Data
SetD),
we
assume
that
theunderlyingequationsdonotchangeintime,i.e.,
timeinvariance
ofthesystem.
3.1
ARMA,FIR,ANDALLTHAT
Therearetwocomplementarytasks
that
need
to
bediscussed:understanding
how
agivenmodelbehavesand
finding
aparticularmodel
that
is
appropriateforagiven
timeseries.We
start
with
the
formertask.
It
issimplest
to
discussseparately
the
role
of
externalinputs(movingaveragemodels)andinternalmemory(autoregressive
models).
3.1.1PROPERTiESOFAGiVENLINEARMODEL.
Moving
average
(MA)
models.
Assume
we
aregivenanexternalinputse­
ries
{e,}
and
want
to
modify
it
to
produceanotherseries
{x,}.
Assmninglinearity
of
the
system
and
causality(thepresentvalue
of
x
isinfluencedby
the
presentand
N
past
values
of
the
inputseries
e),
the
relationshipbetween
the
inputand
output
is
N
x,
=
L
bne'_n
=
boe,
+
b,e,_,
+...+
bNe'_N'
n=Q
(1)
12
Neil
A.
GershenfeldandAndreasS.
Weigend
This
equationdescribesaconvolutionfilter:
the
newseries
x
is
generatedbyan
Nth-order
filterwithcoefficients
b
o
,'
..
,bn
from
the
series
e.
Statisticians
and
econo­
metricianscallthisan
Nth-order
movingaveragemodel,
MA(N).
The
originofthis
(sometimesconfusing)terminologycanbeseen
if
onepicturesasimplesmoothing
filterwhichaverages
the
last
few
values
of
series
e.
Engineerscallthisafiniteim­
pulseresponse(FIR)filter,because
the
output
isguaranteed
to
go
to
zero
at
N
timestepsafter
the
inputbecomeszero.
Propertiesof
the
output
series
x
clearlydependon
the
inputseries
e.
The
question
is
whethertherearecharacteristicfeaturesindependentofaspecificinput
sequence.Foralinearsystem,
the
responseof
the
filter
is
independentof
the
input.
A
characterizationfocusesonpropertiesof
the
system,
rather
than
onproperties
of
the
timeseries.(Forexample,
it
does
not
makesense
to
attribute
linearity
to
a
timeseriesitself,only
to
asystem.)
We
willgivethreeequivalentcharacterizations
of
an
MA
model:
in
the
time
domain(theimpulseresponse
of
the
filter),in
the
frequencydomain(itsspectrum),
and
in
terms
ofitsautocorrelationcoefficients.
In
the
firstcase,
we
assume
that
the
input
is
nonzeroonly
at
asingletime
step
to
and
that
it
vanishesforallother
times.
The
response(in
the
time
domain)
to
this"impulse"issimplygivenby
the
b's
inEq.(1):
at
each
time
stepthe
impulsemoves
up
to
the
nextcoefficient
until,after
N
steps,
the
output
disappears.
The
seriesb
N
,bN_1>"',bo
is
thus
the
impulseresponseof
the
system.
The
response
to
an
arbitrary
input
can
be
computed
bysuperimposing
the
responses
at
appropriatedelays,weightedby
the
respective
input
values("convolution").
The
transferfunctionthuscompletelydescribesa
linearsystem,Le.,asystemwhere
the
superpositionprincipleholds:
the
output
is
determinedbyimpulseresponse
and
input.
Sometimes
it
ismoreconvenient
to
describe
the
filter
in
the
frequencydomain.
Thisisuseful(andsimple)becauseaconvolution
in
the
timedomainbecomesa
product
in
the
frequencydomain.
If
the
input
to
a
MA
model
is
an
impulse(which
hasafiatpowerspectrum),
the
discreteFouriertransform
of
the
output
is
given
by
~;:~o
b
n
exp(
-i21rnf)
(see,forexample,Box
&
Jenkins,1976,p.69).
The
power
spectrum
isgivenby
the
squaredmagnitudeofthis:
(2)
The
third
way
of
representingyetagain
the
sameinformationis,
in
termsof
the
autocorrelationcoefficients,defined
in
termsof
the
mean"
=
(x,)
and
the
variance
".2
=
((x,_
,,)2)
by
1
PT'=
z((x,
-
,,)(X'_T-
,,»).
(3)
".
The
angularbrackets
(-)
denoteexpectationvalues,
in
the
statisticsliteratureoften
indicatedbyE{.}.
The
autocorrelationcoefficientsdescribehowmuch,onaverage,
twovalues
of
aseries
that
are
T
timesteps
apart
co-varywitheachother.
(We
willlaterreplacethislinearmeasurewith
mutual
information,suitedalso
to
de­
scribenonlinearrelations.)
If
the
input
to
the
system
is
astochasticprocesswith
TheFutureofTimeSeries:LeamingandUnderstanding
13
inputvalues
at
differenttimesuncorrelated,
(eiej)
=
0for
i
".
j,
then
all
of
the
crosstermswilldisappearfromtheexpectationvalueinEq.(3),
and
the
resulting
autocorrelationcoefficientsare
ITI
5.N,
ITI
>N.
(4)
Antoregressive
(AR)
models.
MA(orFIR)filtersoperatein
an
openloop
withoutfeedback;
they
canonlytransformaninput
that
isapplied
to
them.
If
we
donotwant
to
drive
the
seriesexternally,
we
need
to
providesomefeedback(or
memory)inorder
to
generateinternaldynamics:
M
Xt
=
L
amXt-m
+
et
.
m=l
(5)
ThisiscalledanMth-orderautoregressivemodel
(AR(M))
oraninfiniteimpulse
response(IIR)filter(because
the
output
cancontinueafter
the
inputceases).De­
pendingon
the
application,
e,
canrepresenteitheracontrolledinput
to
the
system
ornoise.Asbefore,
if
e
is
whitenoise,
the
autocorrelations
of
the
output
series
x
canbeexpressedintermsof
the
modelcoefficients.Here,
however----<iue
to
the
feedbackcouplingofprevious
steps-we
obtainasetoflinearequations
rather
than
just
asingleequationforeachautocorrelationcoefficient.BymultiplyingEq.(5)
by
X'_n
takingexpectationvalues,
and
normalizing(seeBox&
Jenkins,
1976,p.54),
the
autocorrelationcoefficientsofan
AR
modelarefoundbysolving
this
set
of
linearequations,traditionallycalled
the
Yule-Walkerequations,
M
PT
=
L
amPT_ml
=1
T>
O.
(6)
Unlike
the
MAcase,
the
autocorrelationcoefficientneednotvanishafterMsteps.
Taking
the
Fouriertransformof
both
sides
of
Eq.(5)
and
rearrangingtermsshows
that
the
output
equals
the
inputtimes
(1-
2:~1
am
exp(
-i27fmf))-1.
The
power
spectrumof
output
isthus
that
of
the
inputtimes
1
(7)
Togenerateaspecificrealization
of
the
series,
we
mustspecify
the
initialcon­
ditions,usuallyby
the
first
M
valuesofseries
x.
Beyond
that,
the
input
term
e,
iscrucialfor
the
life
of
an
AR
model.
If
therewas
nO
input,
we
might
be
disap­
pointedby
the
series
we
get:depending
on
the
amountoffeedback,afteriterating
14
Neil
A.
GershenfeldandAndreas
S.
Weigend
it
forawhile,
the
output
producedcanonlydecay
to
zero,diverge,oroscillate
periodically.
[~
Clearly,
the
nextstepincomplexity
is
to
allow
both
AR
andMApartsin
the
model;this
is
called
an
ARMA(
M,
N)
model:
MN
Xt
=
L
amXt_m
+
L
bnet-n.
m=l
n=O
(8)
Its
output
ismosteasilyunderstoodinterms
of
the
z-transform
(Oppenheim&
Schafer,1989),whichgeneralizes
the
discreteFouriertransform
to
the
complex
plane:
X(z)
=
L
XtZ
'.
t=-oo
(9)
On
the
unit
circle,
z
=
exp(
-i21rf),
the
z-transformreduces
to
the
discreteFourier
transform.Off
the
unitcircle,
the
z-transformmeasures
the
rate
of
divergenceor
convergence
of
aseries.Sincetheconvolution
of
twoseriesinthetimedomaincor­
responds
to
the
multiplication
of
their
z-transforms,
the
z-transformofthe
output
ofanARMAmodelis
X(z)
=
A(z)X(z)
+
B(z)E(z)
B(z)
1_
A(z)
E(z)
(10)
(ignoringa
term
that
dependson
the
initialconditions).
The
inputz-transform
E(z)
ismultipliedbyatransferfunction
that
is
unrelated
to
it;
the
transferfunctionwill
vanish
at
zeros
of
the
MA
term
(B(z)
=
0)
anddiverge
at
poles
(A(z)
=
1)due
to
the
AR
term
(unlesscancelled
by
azeroin
the
numerator).As
A(z)
isanMth-order
complexpolynomial,and
B(z)
isNth-order,therewillbe
M
poles
and
N
zeros.
Therefore,
the
z-transform
of
a
time
seriesproducedbyEq.(8)canbedecomposed
intoarationalfunction
and
aremalning(possiblycontinuous)
part
due
to
the
input.
The
numberofpoles
and
zerosdetermines
the
numberof
degrees
of
freedom
of
the
system(thenumber
of
previous
states
that
the
dynamicsretains).Note
that
since
only
the
ratioenters,thereis
nO
uniqueARMAmodel.
In
the
extremecases,a
finite-order
AR
modelcanalwaysbeexpressed
by
an
infinite-orderMAmodel,and
viceversa.
ARMAmodelshavedominatedallareas
of
timeseriesanalysisanddiscrete­
timesignalprocessingformore
than
halfacentury.Forexample,inspeechrecogni­
tionandsynthesis,LinearPredictiveCoding(Pressetal.,1992,p.571)compresses
[7]ln
the
case
of
afirst-order
AR
model,
this
caneasily
be
seen:
if
the
absolutevalue
of
the
coefficient
is
less
than
unity,
the
value
of
x
exponentiallydecays
to
zero;
if
it
is
laxger
than
unity,
it
exponentially
explodes.
Forhigher-order
AR
models,
the
long-termbehavior
is
determined
by
the
locations
of
the
zeroes
of
the
polynomial
with
coefficients
ai.
TheFuture
of
TimeSeries:LearningandUnderstanding
15
speech
by
transmittingtheslowlyvaryingcoefficients
for
alinearmodel(andpos­
sibly
the
remainingerrorbetween
the
linearforecastand
the
desiredsignal)
rather
than
the
originalsignal.
If
the
modelisgood,
it
transforms
the
signalintoasmall
number
of
coefficientsplusresidualwhitenoise(ofonekindoranother).
3.1.2
FiniNG
ALINEARMODEL
TO
AGIVENTIMESERIES
Fitting
the
coefficients.
The
Yule-Walkersetoflinearequations
(Eq.
(6))al­
lowedus
to
express
the
autocorrelationcoefficients
of
atimeseriesinterms
of
the
AR
coefficients
that
generatedit.
But
thereisasecondreading
of
the
same
equations:
they
alsoallowus
to
estimate
the
coefficients
of
an
AR(M)
modelfrom
the
observedcorrelationalstructureof
an
observedsignal.
IS)
Analternativeap­
proachviews
the
estimation
of
the
coefficientsasaregressionproblem:expressing
the
nextvalueasafunction
of
M
previousvalues,I.e.,linearlyregress
x,+!
onto
{X"X'_l0'"
,X'-(M_l)}'
Thiscanbedonebyminiruizingsquarederrors:
the
pllr
rametersaredeterminedsuch
that
the
squareddifferencebetween
the
model
output
and
the
observedvalue,summedoveralltimestepsin
the
fittingregion,isassmall
aspossible.There
is
nO
comparableconceptuallysimpleexpressionforfindingMA
and
fullARMAcoefficientsfromobserveddata.Forallcases,however,
standard
techniquesexist,oftenexpressedasefficientrecursiveprocedures(Box&Jenkins,
1976;Press
et
al.,1992).
Althoughthereisnoreason
to
expect
that
anarbitrarysignalwasproduced
byasystem
that
can
be
writtenin
the
formofEq.(8),
it
isreasonable
to
attempt
to
approxiruatealinearsystem's
true
transferfunction(z-transform)byaratio
of
polynomials,i.e.,anARMAmodel.This
is
aprobleminfunctionapproxiruation,
and
it
iswellknown
that
asuitablesequenceofratiosofpolynoruials(calledPade
approximants;seePressetaI.,1992,p.200)convergesfaster
than
apowerseriesfor
arbitraryfunctions.
Selecting
the
(order
of
the)
model.
So
far
we
havedealtwith
the
question
of
how
to
estimate
the
coefficientsfrom
data
foranARMAmodel
of
order
(M,
N),
but
havenotaddressedthechoicefor
the
order
of
the
model.There
is
notaunique
best
choiceforthevalues
or
evenfor
the
numberofcoefficients
to
modela
data
set-as
the
order
of
the
modelisincreased,
the
fittingerrordecreases,
but
the
test
errorof
the
forecastsbeyondthetrainingsetwillusually
start
to
increase
at
somepointbecause
the
modelwillbefittingextraneousnoisein
the
system.There
areseveralheuristics
to
find
the
"right"order(suchas
the
AkaikeInformation
Criterion(AIC),Akaike,1970;Sakomoto
et
aI.,
1986)-but
theseheuristicsrely
heavily
on
the
linearityof
the
model
and
onassumptions
about
the
distribution
fromwhich
the
errorsaredrawn.When
it
is
notclearwhethertheseassumptions
hold,asimpleapproach(butwastefulinterms
ofthe
data)is
to
holdbacksome
of
[8]
Instatistics,
it
iscommonto
emphasize
the
differencebetweena
given
model
and
an
estimated
model
by
using
differentsymbols,
such
as
a
for
the
estimated
coefficients
of
an
AR
model.In
this
paper,
we
avoidintroducing
another
set
of
symbols;
we
hope
that
it
isclearfrom
the
context
whethervaluesaretheoreticalorestimated..
16
Neil
A.
GershenfeldandAndreas
S.
Welgend
the
training
data
and
usethese
to
evaluate
the
performanceofcompetingmodels.
Modelselectionisageneralproblem
that
willreappearevenmoreforcefullyin
the
context
of
nonlinearmodels,becausetheyaremoreflexibleand,hence,more
capable
of
modelingirrelevantnoise.
3.2THEBREAKDOWNOFLINEARMODELS
Wehaveseen
that
ARMAcoefficients,powerspectra,andautocorrelationcoeffi­
cientscontain
the
sameinformationaboutalinearsystem
that
isdrivenbyuncor­
relatedwhitenoise.Thus,
if
andonly
if
the
powerspectrum
is
ausefulcharacter­
ization
of
the
relevantfeaturesofatimeseries,anARMAmodelwillbeagood
choicefordescribingit.Thisappealingsimplicitycanfailentirelyforevensimple
nonlinearitiesif
they
lead
to
complicatedpowerspectra(astheycan).Twotime
series
can
haveverysimilarbroadbandspectra
but
canbegeneratedfromsystems
with
verydifferentproperties,suchasalinearsystem
that
isdrivenstochastically
by
externalnoise,
and
adeterministic(noise-free)nonlinearsystemwithasmall
number
of
degrees
of
freedom.One
the
keyproblemsaddressedinthischapteris
howthesecases
can
be
distinguished-linear
operatorsdefinitelywillnotbeable
to
do
the
job.
Letusconsidertwononlinearexamples
of
discrete-timemaps(likean
AR
model,
but
nownonlinear):

The
firstexample
can
betracedback
to
mam
(1957):
the
nextvalue
of
aseries
is
derivedfrom
the
presentonebyasimpleparabola
x,+!
=
A
x,
(1
-
x,).
(11)
Popularizedin
the
contextofpopulationdynamicsasanexampleofa"sim­
plemathematicalmodelwithverycomplicateddynamics"(May,1976),
it
has
beenfound
to
describeanumberofcontrolledlaboratorysystemssuchashy­
drodynamic
flows
and
chemicalreactions,becauseof
the
universalityofsmooth
unimodalmaps(Collet,1980).
In
thiscontext,thisparabola
is
called
the
lo­
gisticmap
or
quadraticmap.
The
value
x,
deterministicallydependson
the
previousvalue
X'_l;
Aisaparameter
that
controls
the
qualitativebehavior,
rangingfromafixedpoint(forsmallvalues
of
A)
to
deterministicchaos.For
example,forA
=
4,eachiterationdestroysone
bit
of
information.Consider
that,
by
plotting
x,
against
X'_l,
eachvalue
of
x,
hastwoequallylikelypre­
decessorsor,equallywell,
the
averageslope(itsabsolutevalue)
is
two:
if
we
know
the
locationwithinbefore
the
iteration,
we
willonaverageknow
it
within
2.
afterwards.Thisexponentialincreaseinuncertaintyisthehallmark
ofdeternlinisticchaos("divergence
of
nearbytrajectories").

The
secondexample
is
equallysimple:consider
the
timeseriesgeneratedby
the
map
x,
=
2X'_1(mod1).
(12)
TheFuture
of
TimeSeries:LeamingandUnderstanding
17
The
actionofthis
map
is
easilyunderstood
by
considering
the
position
X,
writteninabinaryfractionalexpansion(i.e.,
X,
=
0.d,d
2

=
(d,
X
2-')+
(d2
x
2-2
)
+...
):
eachiterationshiftseverydigitoneplace
to
the
left
(di

<4+1)'
Thismeans
that
the
mostsignificantdigit
d,
is
discarded
and
onemore
digitof
the
binaryexpansionof
the
initialcondition
is
revealed.This
map
can
beimplementedinasimplephysicalsystemconsistingofaclassicalbilliard
ball
and
reflectingsurfaces,where
the
X,
are
the
successivepositions
at
which
the
ballcrossesagivenline(Moore,1991).
Bothsystemsarecompletelydeterministic(theirevolutionsareentirelydeter­
minedby
the
initialcondition
xQ),
yettheycaneasilygenerate
time
serieswith
broadbandpowerspectra.
In
the
contextofanARMAmodelabroadbandcom­
ponent
in
apowerspectrum
of
the
output
mustcomefromexternalnoiseinput
to
the
system,
but
here
it
arisesintwoone-dimensionalsystemsassimpleasa
parabola
and
twostraightlines.Nonlinearitiesareessentialfor
the
production
of
"interesting"behaviorinadeterministicsystem,
the
pointhereis
that
evensimple
nonlinearitiessuffice.
Historically,animportantstepbeyondlinearmodelsforpredictionwastaken
in
1980
byTong
and
Lim(seealsoTong,1990).Aftermore
than
fivedecades
of
approximatingasystem
with
one
globallylinearfunction,
they
suggested
the
use
of
two
functious.This
thresholdautoregressivemodel
(TAR)isgloballynonlinear:
it
consists
of
choosingoneoftwolocallinearautoregressivemodelsbasedon
the
valueof
the
system's
state.
Fromhere,
the
nextstepis
to
usemanylocallinear
models;however,
the
numberofsuchregions
that
mustbechosen
may
bevery
large
ifthe
systemhasevenquadraticnonlinearities(suchas
the
logisticmap).A
naturalextensionofEq.(8)forhandlingthis
is
to
includequadratic
and
higher
orderpowersin
the
model;thisiscalledaVolterraseries(Volterra,1959).
TARmodels,Volterramodels,
and
theirextensionssignificantlyexpand
the
scopeofpossiblefunctionalrelationshipsformodelingtimeseries,
but
thesecome
at
the
expenseof
the
simplicitywithwhichlinearmodels
can
beunderstood
and
fit
to
data.Fornonlinearmodels
to
beuseful,theremust
be
aprocess
that
exploits
featuresof
the
data
to
guide(andrestrict)
the
constructionof
the
model;lack
of
insightintothisproblemhaslimited
the
useofnonlinear
time
seriesmodels.
In
the
nextsections
we
willlook
at
twocomplementarysolutions
to
thisproblem:building
explicitmodelswithstate-spacereconstruction,anddevelopingimplicitmodelsin
aconnectionistframework.Tounderstandwhy
both
of
theseapproachesexistand
whytheyareuseful,letusconsider
the
natureofscientificmodeling.
18
Neil
A.
GershenfeldandAndreas
S.
Weigend
4.
UNDERSTANDINGANDLEARNING
Strongmodelshavestrongassumptions.Theyareusuallyexpressedina
few
equa­
tions
with
afewparameters,and
can
oftenexplainaplethoraofphenomena.
In
weakmodels,on
the
otherhand,thereareonlya
few
domain-specificassumptions.
Tocompensatefor
the
lackofexplicitknowledge,weakmodelsusuallycontain
manymoreparameters(whichcanmakeaclearinterpretationdifficult).
It
canbe
helpful
to
conceptualizemodelsin
the
two-dimensionalspacespannedby
the
axes
data-poor+-7data-rich
and
theory-poor+-7theory-rich.Due
to
the
dramaticexpansion
of
the
capabilityforautomatic
data
acquisitionandprocessing,
it
isincreasingly
feasible
to
ventureinto
the
theory-poor
and
data-richdomain.
Strongmodelsareclearlypreferable,
but
theyoftenoriginateinweakmod­
els.(However,if
the
behaviorofanobservedsystemdoesnotarisefromsimple
rules,
they
may
notbeappropriate.)Considerplanetarymotion(Gingerich,
1992).
TychoBrahe's
(1546-1601)
experimentalobservations
of
planetarymotionwereac­
curatelydescribedbyJohannesKepler's
(1571-1630)
phenomenologicallaws;this
successhelpedlead
to
IsaacNewton's
(1642-1727)
simpler
but
muchmoregeneral
theory
of
gravitywhichcouldderivetheselaws;HenriPoincare's
(1854-1912)
in­
ability
to
solve
the
resultingthree-bodygravitationalproblemhelpedlead
to
the
moderntheory
of
dynamicalsystemsand,ultimately,
to
theidentificationofchaotic
planetarymotion(Sussman&Wisdom,
1988,1992).
Asin
the
previoussectiononlinearsystems,therearetwocomplementary
tasks:discovering
the
properties
of
atimeseriesgeneratedfromagivenmodel,
and
inferringamodelfromobserveddata.
We
focushereonthelatter,
but
there
hasbeencomparableprogressfor
the
former.Exploring
the
behaviorofamodelhas
becomefeasibleininteractivecomputerenvironments,suchasCornell's
dstool,{9!
and
the
combinationoftraditionalnumericalalgorithmswithalgebraic,geometric,
symbolic,
and
artificialintelligencetechniquesisleading
to
automatedplatforms
forexploringdynamics(Abelson,
1990;
Yip,
1991;
Bradley,
1992).
Foranonlinear
system,
it
is
nolongerpossible
to
decompose
an
output
intoaninputsignal
and
an
independenttransferfunction(andtherebyfind
the
correctinputsignal
to
pro­
duceadesiredoutput),
but
thereareadaptivetechniques
for
controllingnonlinear
systems(Hiibler,
1989;
Ott,
Grebogi&Yorke,
1990)
that
makeuseoftechniques
similar
to
the
modelingmethods
that
we
willdescribe.
The
ideaofweakmodeling(data-rich
and
theory-poor)
is
bynomeans
new­
an
ARMAmodelisagoodexample.
What
isnewis
the
emergenceofweakmodels
(suchasneuralnetworks)
that
combinebroadgeneralitywithinsightintohow
to
managetheircomplexity.Forsuchmodelswithbroadapproximationabilities
and
few
specificassumptions,
the
distinctionbetweenmemorizationandgeneralization
becomesimportant.Whereas
the
signal-processingcommunitysometimesuses
the
[9JAvailable
by
anonymous
ftpfrommacomb.tn.comell.eduin
pub/dstool.
TheFuture
of
TimeSeries:LearningandUnderstanding
19
term
learning
for
8Jly
adaptation
of
parameters,
we
need
to
contrastlearningwith­
outgeneralizationfromlearningwithgeneralization.Letusconsider
the
widely
8Jld
wildlycelebratedfact
that
neuralnetworks
C8Jl
learn
to
implement
the
exclu­
sive
OR
(XOR).
But-what
kindoflearningisthis?Whenfour
out
offourcases
arespecified,nogeneralizationexists!Learninga
truth
table
is
nothing
but
rote
memorization:learningXORisasinterestingasmemorizing
the
phonebook.More
interesting-8Jld
more
realistic-are
real-worldproblems,suchas
the
predictionof
fin8Jlcial
data.
In
forecasting,nobodycareshowwellamodelfits
the
training
data­
only
the
qualityoffuturepredictionscounts,
Le.,
the
perform8Jlceonnovel
data
or
the
generalization
ability.Learningmeansextractingregularitiesfromtraining
examples
that
do
transfer
to
newexamples.
Learningproceduresare,inessence,statisticaldevicesforperforminginductive
inference.There
is
atensionbetweentwogoals.
The
immediategoal
is
to
fitthe
trainingexamples,suggestingdevicesasgeneralaspossibleso
that
they
C8Jl
learna
broadr8Jlge
of
problems.
In
connectionism,thissuggestslarge8Jldfiexiblenetworks,
sincenetworks
that
are
too
smallmightnothave
the
complexityneeded
to
model
the
data.
The
ultimategoalofaninductivedeviceis,however,itsperformanceon
cases
it
hasnotyetseen,
Le.,
thequality
of
itspredictionsoutside
the
trainingset.
This
suggests-at
leastfornoisytraining
data-networks
that
arenottoolarge
sincenetworkswith
too
manyhigh-precisionweightswillpick
out
idiosyncrasiesof
thetraining
set
and
will
not
generalizewell.
Aninstructiveexampleispolynomialcurvefittingin
the
presenceofnoise.
On
the
onehand,apolynomialoftoo
Iowan
ordercannotcapture
the
structure
presentin
the
data.
On
the
otherhand,apolynomialof
too
highanorder,going
throughall
of
the
trainingpointsandmerelyinterpolatingbetweenthem,captures
the
noiseaswellas
the
signalandislikely
to
beaverypoorpredictorfornewcases.
Thisproblemoffitting
the
noiseinaddition
to
the
signaliscalled
overjitting.
By
employingaregularizer(Le.,a
term
that
penalizes
the
complexityof
the
model)
it
is
oftenpossible
to
fit
the
parameters
and
to
select
the
relev8Jltvariables
at
the
sametime.Neuralnetworks,forexample,
can
becastinsuchaBayesianframework
(Buntine&Weigend,1991).
Toclearlyseparatememorizationfromgeneralization,
the
true
continuationof
the
competition
data
waskeptsecretuntil
the
deadline,ensuring
that
thecon­
tinuation
data
couldnot
be
usedby
the
particip8Jltsfortaskssuchasparameter
estimationormodelselection.
11O]
Successfulforecasts
of
the
withheld
testset
(also
called
out-oj-samplepredictions)
from
the
provided
trainingset
(alsocalled
jitting
set)
wereproducedbytwogeneralclassesoftechniques:thosebasedonstate­
spacereconstruction(whichmakeuseofexplicitunderstandingof
the
relationship
between
the
internaldegreesoffreedomofadeterministicsystem
and
anobservable
of
the
system's
state
inorder
to
buildamodelof
the
rulesgoverning
the
measured
behaviorof
the
system),andconnectionistmodeling(whichusespotentiallyrich
[lOJAfter
all,predictionsarehard,particularlythoseconcerning
the
future.
20
Neil
A.
GershenfeldandAndreas
S.
Weigend
modelsalongwithlearningalgorithms
to
develop
an
implicitmodelof
the
system).
We
willsee
that
neither
is
uniquelypreferable.
The
domains
of
applicabilityare
not
the
same,and
the
choice
of
which
to
usedependson
the
goalsof
the
analysis
(suchasanunderstandabledescriptionvs.accurateshort-termforecasts).
4.1
UNDERSTANDING:STATE-SPACERECONSTRUCTION
Yule'soriginalideaforforecastingwas
that
futurepredictionscanbeimprovedby
usingimmediatelyprecedingvalues.AnARMAmodel,Eq.(8),canberewritteuas
adotproductbetweenvectors
of
the
time-laggedvariables
and
coefficients:
Xt
=

Xt_l
+
b.
et
,
(13)
where
x,
=
(x"x,_"
...
,X'_(d_l»),
and
a
=
(al,a2,
...
,ad).
(Weslightlychange
notationhere:whatwas
M
(theorder
of
the
AR
model)
is
nowcalled
d
(fordi­
mension).)Suchlagvectors,alsocalled
tapped
delaylines,areusedroutinelyin
the
contextofsignalprocessing
and
timeseriesanalysis,suggesting
that
they
are
more
than
just
atypographicalconvenience.!"i
In
fact,thereisadeepconnectionbetweentime-laggedvectorsandunderlying
dynamics.Thisconnectionwaswasproposedin
1980
byRuelle(personalcommu­
nication),Packard
et
al.
(1980),
and
Takens
(1981:
hepublished
the
firstproof),
and
laterstrengthened
by
Sauer
et
al.
(1991).
Delayvectorsofsufficientlengthare
not
just
arepresentationof
the
state
of
alinear
system-it
turns
out
that
delay
vectorscanrecover
the
fullgeometricalstructureofanonlinearsystem.Thesere­
sultsaddress
the
generalproblem
of
inferring
the
behaviorof
the
intrinsicdegrees
of
freedom
of
asystemwhenafunction
of
the
state
of
the
system
is
measured.
If
the
governingequations
and
the
functionalform
of
the
observableareknownin
advance,
then
aKaimanfilteris
the
optimallinearestimatorof
the
state
of
the
system(Catlin,
1989:
Chatfield,
1989).
We,however,focuson
the
casewherethere
islittleorno
apriori
informationavailable
about
the
origin
of
the
timeseries.
Therearefourrelevant
(and
easilyconfused)spaces
and
dimensions
for
this
discussion:[12]
1.
The
configurationspace
of
asystem
is
the
space"where
the
equationslive."
It
specifies
the
valuesof
all
of
the
potentiallyaccessiblephysicaldegreesof
freedom
of
the
system.Forexample,forafluidgovernedby
the
Navier-Stokes
[llJForexample,
the
spectral
test
forrandomnumbergeneratorsisbased
on
lookingforstructure
in
the
space
of
laggedvectors
of
the
output
of
the
sourcej
these
willlie
on
hyperplanesforalinear
congruentialgenerator
Xt+l
=
aXt
+
b
(mod.
c)
(Knuth,1981,p.90).
(12)Thefirst
point
(configuration
space
and
potentially
accessibledegrees
of
freedom)
will
not
be
used.
again
in
this
chapter.
On
the
other
hand,
the
dimension
of
the
solutionmanifold
(the
actual
degrees
of
freedom)will
be
important
both
forcharacterization
and
forprediction.
TheFuture
of
TimeSeries:LeamingandUnderstanding
21
partialdifferentialequations,theseare
the
infinite-dimensionaldegrees
of
free­
domassociatedwith
the
continuousvelocity,pressure,
and
temperaturefields.
2.
The
solutionmanifold
iswhere"thesolutionlives,"i.e.,
the
part
of
the
confi­
gurationspace
that
the
systemactuallyexploresasitsdynamicsunfolds(such
as
the
supportof
an
attractor
or
an
integralsurface).Due
to
unexcitedorcor­
relateddegreesoffreedom,thiscanbemuchsmaller
than
the
configuration
space;
the
dimensionof
the
solutionmanifoldis
the
numberofparameters
that
areneeded
to
uniquelyspecifyadistinguishable
state
of
the
overallsystem.For
example,insomeregimes
the
infinitephysicaldegreesoffreedom
of
aconvect­
ingfluidreduce
to
asmallsetofcoupledordinarydifferentialequationsfora
modeexpansion(Lorenz,1963).Dimensionalityreductionfrom
the
configura­
tionspace
to
the
solutionmanifoldisacommonfeature
of
dissipativesystems:
dissipationinasystemwillreduceitsdynamicsontoalowerdimensionalsub­
space(Temam,1988).
3.
The
observable
isa(usually)one-dimensionalfunction
of
the
variablesofconfig­
uration,anexampleisEq.(51)in
the
Appendix.
In
an
experiment,thismight
be
the
temperatureoravelocitycomponent
at
apointin
the
fluid.
4.
The
reconstructedstatespace
is
obtainedfrom
that
(scalar)observablebycom­
bining
past
values
of
it
to
formalagvector(which
for
the
convectioncase
would
aim
to
recover
the
evolutionof
the
componentsof
the
modeexpansion).
Givenatimeseriesmeasuredfromsucha
system-and
no
otherinformation
about
the
originof
the
time
series-the
question
is:
What
canbededuced
about
theunderlyingdynamics?
Lety
be
the
state
vectoronthesolutionmanifold(in
the
convectionexample
thecomponents
ofy
are
the
magnitude
of
eachof
the
relevantmodes),let
dy/dt
=
fey)be
the
governingequations,andlet
the
measuredquantitybe
Xt
=
x(y(t))
(e.g.,
the
temperature
at
apoint).
The
results
to
be
citedherealsoapply
to
systems
that
aredescribedbyiteratedmaps.Givenadelay
time
T
and
adimension
d,
alag
vectorxcanbedefined,
lagvector:
Xt
=
(xt,
Xt-r,·
..
I
Xt-(d-l)T)
.
(14)
Thecentralresultis
that
the
behaviorofx
and
ywilldifferonly
by
asmoothlocal
invertiblechangeofcoordinates(i.e.,
the
mappingbetweenx
and
yisanembed­
ding,whichrequires
that
it
be
diffeomorphic)forahnosteverypossiblechoiceof
fey),
x(y),
and
T,
aslongas
d
islargeenough(inaway
that
we
willmakeprecise),
x
depends
on
at
leastsome
of
the
components
of
y,
and
the
remainingcompo­
nentsofyarecoupledby
the
governingequations
to
the
ones
that
influencex.
Theproofofthisresulthastwoparts:alocalpiece,showing
that
the
linearization
of
the
embedding
map
isalmostalwaysnondegenerate,
and
aglobalpart,showing
22
Neil
A.
GershenfeldandAndreas
S.
Weigend
1000
points
.
.
.
".'.
25000
points
..
;,":
.
FIGURE5Stereopairsforthethree-dimensionalembeddingofDataSet
A.
The
shape
of
thesurface
is
apparentwithjustthe1,000pointsthatweregiven.
TheFuture
of
TimeSeries:LearningandUnderstanding
23
that
thisholdseverywhere.
If
T
tends
to
zero,
the
embeddingwill
tend
to
lieon
the
diagonalof
the
embeddingspaceand,as
T
isincreased,
it
setsalengthscale
for
the
reconstructeddynamics.There
can
bedegeneratechoicesfor
T
forwhichthe
embeddingfalls(suchaschoosing
it
to
beexactlyequal
to
the
periodofaperiodic
system),
but
thesedegeneraciesalmostalwayswill
be
removedbyanarbitrary
perturbationof
T.
Theintrinsicnoiseinphysicalsystemsguarantees
that
these
resultsholdinallknownnontrivialexamples,althoughinpractice,ifthecoupling
betweendegrees
of
freedomissufficientlyweak,
then
the
availableexperimental
resolutionwillnotbelargeenough
to
detect
them
(seeCasdagli
et
al.,1991,
for
furtherdiscussion
of
hownoiseconstrainsembedding)
.1'3
1
Data
SetAappearscomplicatedwhenplottedasatimeseries(Figure1).
The
simplestructureof
the
systembecomesvisibleinafigureofitsthree-dimensional
embedding(Figure5).
In
contrast,high-dimensionaldynamicswouldshowupasa
structurelesscloudinsuchastereoplot.Simplyplotting
the
data
inastereoplot
allowsone
to
guessavalueof
the
dimensionof
the
manifold
of
aroundtwo,notfar
fromcomputedvaluesof2.0-2.2.
In
Section6,
we
willdiscussindetailthepractical
issuesassociatedwithchoosingandunderstanding
the
embeddingparameters.
Time-delayembeddingdiffersfromtraditionalexperimentalmeasurementsin
threefundamentalrespects:
1.
It
providesdetailedinformation
about
the
behaviorofdegreesoffreedomother
than
the
one
that
isdirectlymeasured.
2.
It
restsonprobabilisticassumptions
and-although
it
hasbeenroutinelyand
reliablyusedin
practice-it
is
notguaranteed
to
bevalidfor
any
system.
3.
It
allowsprecisequestionsonly
about
quantities
that
areinvariantundersuch
atransformation,sincethereconstructeddynamicshavebeenmodifiedbyan
unknownsmoothchangeofcoordinates.
Thislastrestrictionmaybeunfamiliar,
but
it
is
surprisinglyunImportant:
we
will
show
how
embedded
data
canbeusedforforecastinga
time
series
and
forchar­
acterizing
the
essentialfeaturesof
the
dynamics
that
producedit.
We
closethis
sectionbypresentingtwoextensions
of
the
simpleembeddingconsideredsofar.
Filtered
embedding
generalizessimpletime-delayembeddingbypresenting
alinearlytransformedversionof
the
lagvector
to
the
nextprocessingstage.Thelag
vectorxistriviallyequal
to
itselftimes
an
identitymatrix.Rather
than
using
the
identitymatrix,
the
lagvectorcan
be
multipliedbyany(notnecessarilysquare)
matrix.
The
resultingvector
is
anembeddingif
the
rankof
the
matrix
is
equal
to
orlarger
than
the
desiredembeddingdimension.(Thewindowoflagscanbe
larger
than
the
finalembeddingdimension,whichallows
the
embeddingprocedure
[131The
Whitney
embeddingtheoremfrom
the
1930s(seeGuillemin
&
Pollack,1974,p.48)guar­
antees
that
the
number
of
independentobservations
d
required
to
embed
an
arbitrarymanifold
(in
the
absence
of
noise)intoaEuclideanembeddingspace
will
be
no
morethantwicethedi­
mension
of
the
manifold.Forexample,atwo-dimensionalMobiusstripcanbeembeddedina
three.-dimensionalEuclideanspace,butatwo-dimensionalKlein
bottle
requiresafour-dimensional
space.
(15)
24
Neil
A.
GershenfeldandAndreas
S.
Weigend
to
includeadditionalsignalprocessing.)Aspecificexample,usedbySauer(this
volume),isembeddingwitha
matrix
producedbymultiplyingadiscreteFourier
transform,alow-passfilter,
and
an
inverseFouriertransform;aslongasthefilter
cut-offischosenhighenough
to
keep
the
rankoftheoveralltransformationgreater
than
or
equal
to
the
requiredembeddingdimension,thiswillremovenoise
but
will
preserve
the
embedding.Thereareanumberofmoresophisticatedlinearfilters
that
canbeusedforembedding(Oppenheim&Schafer,1989),and
we
willalsosee
that
connectionistnetworks
can
beinterpretedassetsofnonlinearfilters.
A
finalmodificationoftime-delayembedding
that
canbeusefulinpracticeis
embedding
by
expectation
values.
Often
the
goal
of
ananalysis
is
to
recovernot
the
detailedtrajectoryof
x(t),
but
rather
to
estimatetheprobabilitydistribution
p(x)
forfinding
the
systemin
the
neighborhoodofapoint
x.
Thisprobability
is
definedoverameasurementofduration
T
intermsof
an
arbitrary
test
function
g(x)
by
~
loT
g(x(t))dt
=
(g(x(t))),
=
J
g(x)p(x)
dx.
Note
that
thisisanempiricaldefinitionof
the
probabilitydistributionfor
the
observedtrajectory;
it
is
notequivalent
to
assuming
the
existence
of
aninvariant
measure
or
ofergodicityso
that
the
distributionisvalidforallpossibletrajectories
(Petersen,1989).
If
acomplexexponential
is
chosenfor
the
test
function
(eik.X('»)
=
(eik(x"X'-T"",X'_(d_')T»)
=
J
eik.xp(
x)
dx,
(16)
we
see
that
the
timeaverage
of
thisisequal
to
the
Fouriertransform
of
the
desired
probabilitydistribution(this
is
just
acharacteristicfunctionof
the
lagvector).This
means
that,
if
it
isnotpossible
to
measureatimeseriesdirectly(suchas
for
very
fastdynamics),
it
canstillbepossible
to
dotime-delayembeddingbymeasuringa
set
of
time-averageexpectationvaluesand
then
taking
the
inverseFouriertransform
to
find
p(x)
(Gershenfeld,1993a).
We
willreturn
to
thispointinSection
6.2
and
showhowembeddingbyexpectationvaluescanalsoprovideausefulframeworkfor
distinguishingmeasurementnoisefromunderlyingdynamics.
We
haveseen
that
time-delayembedding,whileappearingsimilar
to
traditional
state-spacemodelswithlaggedvectors,makesacruciallinkbetweenbehaviorin
the
reconstructed
state
space
and
the
internaldegreesoffreedom.
We
willapplythis
insight
to
forecastingandcharacterizingdeterministicsystemslater(inSections5.1
and
6.2).Now,
we
address
the
problemofwhatcanbedoneif
we
areunable
to
understandthesysteminsuchexplicitterms.
The
mainideawillbe
to
learn
to
emulate
the
behaviorof
the
system.
TheFuture
of
TimeSeries:LearningandUnderstanding
4.2LEARNING:NEURALNETWORKS
25
In
the
competition,
the
majorityofcontributions,
and
also
the
bestpredictionsfor
eachsetusedconnectionistmethods.Theyprovideaconvenientlanguagegame
fornonlinearmodeling.Connectionistnetworksarealsoknownasneuralnetworks,
paralleldistributedprocessing,
or
evenas"brain-stylecomputation"
j
weusethese
termsinterchangeably.Theirpracticalapplication(such
as
bylargefinancialinsti­
tutionsforforecasting)hasbeenmarked
by
(andmarketedwith)greathope
and
hype(Schwarz,
1992;
Hannnerstrom,1993).
Neuralnetworksaretypicallyusedin
pattern
recognition,whereacollection
of
features(suchasanimage)
is
presented
to
the
network,and
the
taskis
to
assign
the
input
feature
to
oneormoreclasses.Anothertypicaluseforneuralnetworks
is(nonlinear)regression,where
the
taskis
to
findasmoothinterpolationbetween
points.
In
both
thesecases,alltherelevantinformationispresentedsimultaneously.
In
contrast,timeseriespredictioninvolvesprocessing
of
patterns
that
evolveover
time-the
appropriateresponse
at
aparticularpointintimedependsnotonlyon
thecurrentvalueof
the
observable
but
alsoon
the
past.
Timeseriespredictionhas
had
an
appeal
for
neuralnetworkersfrom
the
verybeginningof
the
field.
In
1964,
HuappliedWidrow'sadaptivelinearnetwork
to
weatherforecasting.
In
the
post­
backpropagationera,Lapedes
and
Farber(1987)
trained
their(nonlinear)network
to
emulate
the
relationshipbetween
output
(thenextpointintheseries)
and
in­
puts(itspredecessors)forcomputer-generatedtimeseries,andWeigend,Huberman
andRumelhart(1990,1992)addressed
the
issue
of
findingnetworksofappropri­
atecomplexityforpredictingobserved(real-world)timeseries.
In
allthesecases,
temporalinformationispresentedspatially
to
the
networkbyatime-laggedvector
(alsocalled
tapped
delayline).
Anumberofingredientsareneeded
to
specifyaneuralnetwork:
itsinterconnectionarchitecture,
itsactivationfunctions
(that
relate
the
output
valueofanode
to
itsinputs),

the
costfunction
that
evaluates
the
network's
output
(such
as
squarederror),
atrainingalgorithm
that
changes
the
interconnectionparameters(called
weights)inorder
to
minimize
the
costfunction.
The
simplestcaseisanetwork
without
hidden
units:
it
consists
of
oneout­
put
unit
that
computesaweightedlinearsuperpositionof
d
inputs,
out(')
=
2:t=1
wixl')·
Thesuperscript(')denotesaspecific"pattern";
xl')
is
the
value
of
the
ith
inputof
that
pattern.l'4j
Wi
is
the
weightbetweeninputi
and
the
output.
Thenetwork
output
can
alsobeinterpretedasadot-productw.
xC')
between
the
weightvectorw
=
(WI,'",Wd)
and
aninput
pattern
x(t)
=
(xi'),
...
,x~'»).
[14
IIn
the
context
of
time
seriesprediction,
x~t)
can
be
the
ith
component
of
the
delayvector,
(l)
Xi
=
Xt_i·
26
Neil
A.
GershenfeldandAndreasS.Weigend
Givensuchaninput-outputrelationship,
the
centraltaskinlearning
is
to
find
away
to
change
the
weightssuch
that
the
actual
output
out(t)getscloser
to
thedesired
output
or
target(t).
Theclosenessisexpressed
by
acostfunction,
forexample,
the
squarederror
E(t)
=
(out(t)-
target(t)j2.
Alearningalgorithm
iterativelyupdates
the
weightsbytakingasmall
step
(parametrizedby
the
learning
rate
1))
in
the
direction
that
decreases
the
error
the
most,
Le.,
following
the
negative
of
the
localgradient.
1>51
The
"new"weight
Wi,
after
the
update,isexpressedinterms
of
the
!told"
weight
Wi
as
_
BE(t)
Wi
=
Wi
-
'T/--
=
Wi
+
21]
BWi
~
Sout(t)-:arget(t»),.
activation
error
(17)
Theweight-change
(Wi
-
Wi)
isproportional
to
the
product
of
the
activationgoing
into
the
weight
and
the
sizeoferror,here
the
deviation
(out(t)
-
target(t»).
This
ruleforadaptingweights(forlinear
output
unitswithsquarederrors)goesback
to
Widrow
and
Hoff
(1960).
If
the
input
valuesare
the
laggedvalues
of
a
time
series
and
the
output
is
the
predictionfor
the
nextvalue,thissimplenetworkisequivalent
to
determiningan
AR(
d)
modelthroughleastsquaresregression:
the
weights
at
the
endoftraining
equal
the
coefficients
of
the
AR
model.
Linearnetworksarevery
limited-exactly
aslimitedaslinear
AR
models.
The
keyidearesponsibleforthepower,potential,
and
popularityofconnectionismis
theinsertionofoneofmorelayersof
nonlinear
hidden
units
(between
the
inputs
and
output).
Thesenonlinearitiesallowforinteractionsbetween
the
inputs(suchas
productsbetweeninputvariables)andtherebyallow
the
network
to
fitmorecom­
plicatedfunctions.(This
is
discussedfurtherin
the
subsectiononneuralnetworks
and
statisticsbelow.)
The
simplestsuchnonlinearnetworkcontainsonlyonehiddenlayerandis