F ree Lunc h T heorems for ptimization O

Da vid H W e

IBM Almaden Researc hCen ter

Na

Harry Road

San Jose CA

William G Macready

San F e Institute

Hyde P ark Road

San ta F e

Decem b er

Abstract

A framew ork is dev elop ed to explore t he connection b et w een ectiv e optimization

algorithms and the problems they are solving A n um b er f o no free lunc h FL

theorems are presen ted that establish that or f an y algorithm an y elev ated p erformance

o v er one class of problems is exactly paid f or in p erformance o v er another class These

theorems result in a g eometric in terpretation of what it means f or an algorithm to

be w ell suited to an optimization problem Applications of the NFL theorems to

information theoretic asp ects of optimization nd a b e nc hmark measures of p erformance

are also presen ted Other i ssues addressed are time arying optimization problems

and a pr ior i eadoead minimax distinctions b et w een optimization algorithms

distinctions that can btain o despite he t NFL theorems enforcing of a t yp e of uniformit y

o v er all algorithms

In tro duction

The past few decades ha v e seen increased n i terest in generalurp ose lac k o x optimiza

tion algorithms that exploit little if an ykon wledge concerning the o ptimization problem on

whic h they are run In large p art hese t algorithms ha v edra wn inspiration f rom optimization

pro cesses that o ccur in n ature In particular the t w o m ost p opular blac k o x optimization

strategies ev olutionary algorithms O W Hol nd a sim ulated annealing GV m imi c

pro cesses in natural selection and statistical mec hanics resp ectiv ely

NM

ta

rt olp

NoIn ligh t of his t in terest in generalurp ose optimization algorithms it has b ecome im

p ortan t t o understand t he relationship b et w een ho ww ell an a lgorithm a p erforms and the

optimization problem f on whic h it s i run In this pap er w e p resen t a formal analysis that

con tributes to w ards suc h a n nderstanding u b y ddressing a questions ik l e the follo wing Giv en

the plethora of blac k o x optimization a lgorithms and of optimization problems h o w can w e

b est matc h algorithms to problems ho w b est can w e relax the b lac k o x nature of the

algorithms and ha v e them exploit some kno wledge concerning the optimization problem In

particular while serious optimization practitioners almost alw a ys p rform e suc h atc m it

is usually on an ad ho c asis b ho w can suc hmtca hing b e formally analyzed More generally

what is the underlying mathematical sk eleton of optimization theory b e fore the esh of

the probabilit y istributions d o f a particular con text and set of optimization problems are im

p osed What can information theory and a B y esian analysis con tribute to n a understanding

of these issues Ho w a p riori generalizable are the p erformance results of a ertain c algorithm

on a certain class of problems to its p erformance on other classes o f p roblems Ho w should

w eev en measure suc h generalization ho wshloudw e a ssess the p erformance of algorithms

on problems so that w ema y programmatically compare those algorithms

Broadly sp eaking w etak et w o approac hes to these questions First w ein v estigate what

aprrioi restrictions there are on the pattern of p erformance of one r o more algorithms as one

runs o v er the set of all o ptimization problems Our second approac h is to i nstead fo cus on

a particular problem and onsider c the ects f o running o v er all a lgorithms In the curren t

pap er w eprenes t results from b oth t yp es of analyses but oncen c trate largely on the st

approac h The reader s i referred to the companion pap er MW for more k inds of analysis

in v olving the second approac h

W e b egin in Section b yin tro ducing the necessary notation Also discussed in this

section is the mo del of computation w e adopt its l imitations and the easons r w ec hose it

One migh t exp ect that there are pairs of searc h algorithms A B suc httha A per

forms b etter han t B on a v erage ev en if B sometime s outp erforms A As a n e xample one

migh t e xp ect that hilllim bing usually utp o rforms e h illescending i f one goal is to d a

maxim um of the cost unction f O ne migh t lsoa exp ect it w ould outp erform a random searc h

in suc ha onc text

One of the main results of this ap p er is that uc s h exp ectations are ncorrect i W e pro v e

t w o NFL theorems in Section that emonstrate d this and ore m g enerally i lluminate the

connection b et w een algorithms and problems Roughly sp e aking w esho w that for b tho

static and time dep enden t optimization roblems p the a v erage p rformance e of an y pair of

algorithms across all p ossible p roblems s i exactly iden tical This means in articular p that if

some algorithm a p e rformance is sup erior to that o f nother a a lgorithm a o v er some set of

optimization problems then the rev erse m ust b e true o v er the set of all ther o optimization

problems he reader is urged to read this section carefully f or a precise statemen t of these

theorems This is true ev en if one of the a lgorithms s i random an y a lgorithm a p erforms

worse than r andomly just as readily o v er the set of all optimization problems as it p erforms

b etter than randomly ossible ob jections to these results are also addressed in Sections

and

In Section w e presen t a geometric in terpretation of the N FL theorems In particular

and

hingw e sho w that n a lgorithm a a v erage p erformance is determined b yho w ligned it is ith w

the underlying probabilit y d istribution o v er optimization problems on whic h it is run This

Section is critical for a n y one wishing to understand ho w he t NFL results are consisten twith

the w ellccepted fact that man y earc s h algorithms that do n ot tak ein to accoun t kno wledge

concerning the cost function w ork quite w ell in ractice p

Section demonstrates that the FL N theorems allo w o ne to answ er a n b er o f what

w ould otherwise seem to b e in tractable q uestions The implications f o these answ ers for

measures of algorithm p erformance and of h o w b est to compare optimization algorithms are

explored in Section

In Section w e discuss some of the w a h despite the NFL theorems algo

rithms can ha v e a pr ior i distinctions that h old ev en if nothing is sp ecid concerning he t

optimization p roblems In particular w esho w hat t there can b e headoead minimax

distinctions b et w een a air p f o a lgorithms t i i w e sho w that considered one f at a time a

pair of algorithms ma y b e distinguishable ev en if they are not when one o l o ks o v er all f

In Section w e p resen tanin tro duction to the a lternativ e approac h to the formal analysis

of optimization in hic w h p roblems a re held ed and ne o lo oks at rop p erties across the space

of algorithms Since these results hold in general they hold for n a y and all optimization

problems and in this are indep enden t of the what kinds f o p roblems one s i m ore or less lik

to encoun ter in the real w orld In particular these results state that one has no a priori

justiation f or using a s earc h algorithm b eha vior so far on a particular ost c function

to predict its future b ha e vior on that f unction In fact when c ho osing b et w een algorithms

based on their observ ed p erformance i t o d es ot n sue to mak e n a a ssumption ab o ut the cost

function some urren tly p o rly o understo o d assumptions re a also b eing made ab out ho w

the algorithms in question are related to eac h ther o and t o the cost function In ddition a to

presen ting results ot n found in W this section serv es as an in tro duction to p ersp ectiv e

adopted in W

W e conclude in Section w ith a brief discussion a summary of results a nd a short list

of op en problems

W eha v e coned as man y o f our pro ofs to pp a endices as p o ssible to facilitate the w

of the pap er A m ore detailed nd a substan tially longer v ersion of this pap r e a v ersion

that also analyzes some issues not ddressed a in this pap er can b e ound f in M

Finally e cannot emphasize enough that no claims whatso ev er are b e ing made in

this pap er c oncerning ho ww v arious searc h a lgorithms w ork in practice fo c

this pap er is on what can b e said a priori ithout w an y ssumptions a a nd from mathematical

principles alone concerning the utilit y o f a searc h a lgorithm

Preliminaries

W e restrict a tten tion to com binatorial ptimization o in whic h the searc h space X hough t

p erhaps quite large i s nite W e further assume that the space of p ossible ost v alues Y

is also ite T hese restrictions are automatically et m for optimization algorithms run on

digital computers F or example t ypically Y is some or bit represen tation of the real

of us The ell

ely

whic in ys

umn um b ers in suc h a case

The size of the s paces X Y are indicated b y j and j resp ectiv ptimization O

problems f ometimes called ost functions or b jectiv e f unctions or nergy func

X

tions are represen ted s a m appings f X F Y is then the space of all p ossible

jX j

problems F is of size jY j v ery large but nite n um b r e In addition to static f e

shall also b e in terested in optimization problems that dep end explicitly on ime t The extra

notation needed for suc h timeep enden t problems w ill b e in tro duced as needed

It is common in the optimization comm unit y to adopt an oracleased view of computa

tion In this view when assessing he t p erformance of algorithms results are stated n i terms

of the n um b er of function ev aluations required to d a certain solution Unfortunately

though man y optimization lgorithms a a re w asteful o f function e v aluations In particular

man y algorithms do ot n remem b er where they ha v e already searc hed nd a therefore often

revisit the same p in o ts lthough A an y lgorithm a that is w asteful in this fashion an c b e made

more eien t simply b y remem b ering where i t as h b een tabu searc h Glo Glo

man y real orld algorithms elect not to emplo y this stratagem Accordingly rom f the p oin t

of view of the oracleased p erformance easures m there a re rtefacts distorting the ap

paren t relationship b et w een man y suc h real orld algorithms

This diult y is exacerbated b y the fact that the amoun t o f revisiting that o ccurs is

a complicated function of b oth the algorithm a nd the ptimization o problem and therefore

cannot b e imply s tered o ut of a mathematical nalysis a Accordingly eha v e elected to

circum v e n t the problem en tirely b y comparing algorithms based o n t he n um ber of distinct

function ev aluations they ha v e p erformed Note that this do es not mean that w e cannot

compare lgorithms a that a re w asteful of ev aluations it simply means that w e compare

algorithms b y coun ting only their n b er of istinct d calls to the o racle

W e call a timerdered set of m distinct visited p oin ts a sample o f size m Samples are

x y x y

denoted b y d d d m m g The p o in ts in a sample are ordered

m

m m m m

x

according to the time at whic h they w ere generated Th us d i ndicates i the X v of

m

y

the i th successiv e elemen t in a sample f o size m and d i s i he t asso ciated cost or Y v alue

m

y y y

d d m g will b e used to indicate the ordered set of cost v alues The space

m m m

m

of all samples of size m D X Y o d and the set f o ll a p ssible o samples

m m m

of arbitrary size is D D

m m

As an imp ortan t clariation of this deition consider a hillescending algorithm

This is the algorithm that examines a set of neigh b o ring p o in X and o m v es to the one

ha ving the lo w est cost The pro cess is then iterated from the newly c hosen p oin t ften

implem en tati ons of illescending h stop when they reac h a lo cal minim um utb they can

easily b e extended to run onger l b y randomly jumping to a new un visited p oin t once the

neigh b orho o d of a lo cal minim um has b een exhausted The p oin t to ote n is t hat b e cause

a sample on c tains all the previous p in o ts at whic h the oracles w as consulted it includes the

X Y alues o f al l the neigh b ors of the c urren tpoin t and ot n only the o l w est c ost one that

the algorithm mo v es to This m ust b e tak en in to accoun twhen coun ting the v alue of m

Optimization algorithms a are represen ted s a mappings rom f p reviously v isited sets of

p oin ts to a s ingle new i p reviously un visited p oin tin X F ormally a d

f x j x d g Giv en our decision to o nly m easure distinct function ev aluations ev en if an

X

in ts

is

alue

um

ely jY jX andalgorithm revisits previously searc hed p in o ts our eition d of an algorithm includes all

common blac k o x ptimization o tec hniques ik l esmi ulated annealing and v e olutionary algo

rithms ec hniques lik e branc h and b ound L W are not included since hey t ely r explicitly

on the cost structure o f partial solutions and w e a re here in terested primarily in blac k o x

algorithms

As deed ab o v e a s earc h a lgorithm i s deterministic ev ery sample maps to a nique u new

p oin t Of course essen tially all algorithms implem e n ted on computers are deterministic nd

in this our eition d is ot n restrictiv e Nonetheless it is w orth noting that all f o our results

are extensible to noneterministic algorithms where the new p oin tisc hosen sto c hastically

from the set f o un visited p in o ts his p oin t i s returned to b lo e w

Under the oracleased o m d el of computation an y easure m of the p erformance of an

y

algorithm after m iterations is a unction f f o the sample d Suc h p erformance measures

m

y

will b e indicated b y d As an example if w e a re trying to nd a minim f ehn

m

a reasonable measure of the p erformance of a migh tbe the v alue of the lo w est Y v alue in

y y y

d d min f d i i m g Note t hat easures m of p erformance based on factors

i

m m m

y

other than d w all clo c k time are utside o the scop e f o our results

m

W e shall cast all of our esults r in terms of probabilit y theory e do so or f three reasons

First it allo ws simple generalization of our results to sto c hastic algorithms Second v e en

when the setting is deterministic probabilit y t p vides a simple consisten t f ramew ork

in whic h to carry out pro fs o

The third reason for using probabilit y theory is p erhaps the most in teresting A crucial

factor in the robabilistic p ramew f ork is the distribution P f P f x x Xj This

j

distribution deed o v er F iv es the p robabilit y that eac h f is the actual optimization

problem at hand n A approac h based on this distribution has t he imm ediate adv an tage that

often kno wledge of a problem is statistical in ature n and his t information ma y b e easily

enco dable in P f F or example Mark o v r o ibbs G random ld descriptions S of

families of optimization problems xpress e P f exactly

Ho w ev er exploiting P f also has dv a an tages ev en when w e are presen ted w ith a single

uniquely sp ecid cost function One suc havd tage is the fact that although it ma ybe

fully sp ecid man y a sp ects of the cost function are e ctively unkno e e c ertainly

do not kno w the extrema of the function I t s i i n an m yw a ys most appropriate to ha v ethis

ectiv e ignorance rected in the analysis as a p robabilit y distribution More generally

w e usually as though the cost function is partially unkno wn F or example w e migh t

use the same searc h algorithm for all cost functions i n a class e a ll tra v eling salesman

problems a h ving certain c haracteristics In so doing w e are implici tly ac kno wledging that

w e consider distinctions b et w een the cost functions n i that lass c to b e irrelev an t or at l east

unexploitable In this sense ev en though w e are presen ted w ith a single particular problem

from that class w e act as though w e are presen ted with a p robabilit y distribution o v er cost

functions a distribution t hat i s nonero only for m em b e rs of that class o f c ost functions

P fs th us a rior p sp eciation of the class of the ptimization o problem at h and with

diren t classes of roblems p corresp onding to diren tc hoices of what algorithms w e will

In particular note that random n um b er generators are deterministic g iv en a seed

act

wn

an

ro heory

of um

use and giving rise to diren t d istributions P f

Giv en our c hoice to use probabilit y theory the p erformance of an algorithm a iterated

y

m times on a c ost unction f f is measured with P d j f m a This s i the conditional proba

m

y

bilit y of obtaining a particular sample d under he t stated conditions F rom P d j f m a

m

m

y

p erformance measures d can b e found easily

m

y

In the next section w e will analyze P d j f m a and i n p articular o h w t i an c v ary ith w

m

the algorithm a Before pro ceeding with that nalysis a o h w ev er it is w orth brie noting

that there are other formal approac hes o t the issues in v estigated i n this pap er P erhaps the

most prominen t of hese t s i the ld of computational complexit y U e the approac htak en

in this pap er computational complexit y ostly m gnores i the statistical nature of searc h and

concen trates instead on computational issues Muc h hough b y n o means all of computa

tional complexit y s i concerned with ph ysically unrealizable computational devices uring

mac hines and the w orst case amoun t f o resources they require to d optimal solutions In

con trast the analysis in this pap er o d es ot n concern tself i ith w the computational engine

used b y the searc h algorithm but rather concen trates exclusiv ely n o the underlying statisti

cal nature of the s earc h p roblem In his t t he curren t robabilistic p approac h s i complime n

to computational complexit y uture w ork i n v olv es com bining our a nalysis of the statistical

nature of searc h with practical oncerns c or f computational resources

The NFL theorems

In this section w e analyze the connection b et w een algorithms and cost functions W eha v e

dubb ed the asso iated c results o F ree Lunc h FL theorems b ecause they demonstrate

that if an algorithm p erforms w ell o n a certain class of roblems p then it necessarily pa ys

for that with degraded p erformance on the set of all remaining roblems p Additionally he

name emphasizes the parallel ith w similar results i n s up ervised earning l W ola W olb

The precise question addressed in this ection s is Ho w o d es the set f o problems F

for whic h algorithm a p e rforms b e tter than algorithm a compare to the set F for

whic h the rev erse is true T o address this question w e ompare c the sum o v er all f of

y y

P d j f m a o t the sum o v er all f P d j f m a This comparison constitutes a ma jor

m m

y

result of this pap r e P d j f m a is i ndep enden tof a when w ea v erage o v er all cost functions

m

Theorem F or any p air of algorithms a a

X X

y y

P d j f m a P d j f m a

m m

f f

A pro of of this esult r is found in A pp endix A An mmedi i ate corollary of this result is that for

y y

an y p erformance measure d the a v erage o v er all f of P d j f m a is indep enden t

m m

of a The precise w a y that the sample is mapp ed to a p erformance measure is unimp ortan t

This theorem explicitly demonstrates that what an algorithm gains n i p erformance on

one class of problems it necessarily pa ys for n o the remaining problems that is the only w a y

that all algorithms can ha v e the same f v eraged p erformance

and

of

tary

nlikA r esult analogous to Theorem olds h for a class of timeep enden t c ost functions The

timeep enden t functions w e consider b egin with n a nitial i cost function f that is presen t

at the sampling of the rst x v alue B efore the b eginning of eac h ubsequen s t i teration of

the optimization algorithm the cost f unction is deformed to a new function as sp cid e

b y a mapping T F N F W e ndicate i this apping m with the notation To hte

i

function presen t uring d the i th iteration is f T f T is assumed to b e a oten tially

i i i i

i ep enden t bijection b t e w een F F e imp ose bijectivit y b ecause if it did not hold

the ev olution of cost functions could narro w n i on a region of f for hic w h some algorithms

ma y p erform b etter than others This w ould constitute an a p r i or i bias in fa v or of those

algorithms a bias w hose analysis w e w ish to d efer to future w ork

Ho w b est to assess the qualit y f o an a lgorithm p erformance on timeep nden e t cost

functions is not clear Here w e consider t w osc hemes based on manipulations of the deition

y

of the sample In sc heme the particular Y v alue in d j corresp onding to a particular

m

x x

x v alue d js giv en b y the cost function that w as presen t when d j as sampled In

m m

y

con trast for sc heme w e imagine a ample s D giv en b yteh Y v alues from the pr esent

m

x x x x

cost function for eac hof teh x v alues in d ormally f i d f d m genh

m m m m

y x x

in sc heme w eha v e d f f d f d m gnd in sc heme w eha v e

m m

m m m

y x x

D f f d d m g where f T f s i the al cost function

m m m m m

m m m

In some ituations s it ma y b e that the mem b ers of the sample liv e for a long time on

the time scale of the ev olution of he t cost function In suc h situations t i a m y b e a ppropriate

y

to judge the qualit y of the searc h algorithm b y D ll a those previous elemen ts of the sample

m

are still liv e at time m and therefore their curren t ocstis fio n terest n O the other hand

if mem b ers of the ample s l iv e for only a short time on the time scale of ev olution of the cost

function one ma y instead b e concerned with things lik eho ww ell the iving mem b er of

the sample trac kthe c hanging cost unction f n I suc h situations it ma ymak e m

y

judge the qualit y o f the algorithm ith w the d sample

m

Results similar to Theorem can b e eriv d ed for b oth sc hemes By analogy with that

theorem w ea v erage o v er all p ossible w a ys a ost c function ma y b e t imeep enden t i e

P

y

a v erage o v er all T ather than o v er all f us w e onsider c P d j f where f

T

m

is the initial cost function Since T only tak es ect for m and since f is ed there

a priori distinctions b et w een algorithms as far s a the st mem b er of he t p o pulation is

concerned Ho w ev er after redeing samples to only con tain those elemen ts added after the

st iteration of the a lgorithm w earirv eatteh olofl wing result p ro v en in App endix B

y y

Theorem F or al l d D m algorithms a and a and initial c ost functions f

m m

X X

y y

P d j f P d j f

m m

T T

and

X X

y y

P D j f P D j f

m m

T T

An ob vious restriction w ould b e to require that T do esn v ary w ith time so that it is a mapping simply

from F to F n A a nalysis for T limited this w a yis bey ond the scop e of this pap er

are

Th

to sense ore

and

So in particular if one algorithm outp erforms a nother for ertain c kinds of v e olution op erators

then the rev erse m ust b e true on the set of all ther o ev olution o p erators

Although this particular esult r is similar to the NFL esult r for the static c ase in general

the timeep enden t situation is more subtle In particular with timeep endence there are

situations in whic h t here can b e a priori distinctions b et w een algorithms ev en for those

mem b ers of the p pulation o rising a a fter the st F or example in general there will b e

P

y

distinctions b et w een a lgorithms when considering the quan tit y P d j f T m a T o see

f m

this consider the case where X is a et s of con tiguous n i tegers and or f all i terations T is a

shift op erator r eplacing f x y f x for ll a x ith min x max x F or suc ha

case w e can construct algorithms whic h b v e d iren a priori or example tak e a

b e the algorithm that st samples f x extn a t x and so o n regardless of the v alues

y

in the p opulation Then for an y f d is alw a ys made up of iden tical Y v alues Accordingly

m

P

y y y

P d j f T m a is onero n only for d for whic hall v alues d i a re iden tical Other

f

m m m

searc h algorithms ev en for t he same shift T do n v e t his r estriction on Y v alues This

constitutes an a priori distinction b et w een algorithms

Implications of the NFL heorems t

As emphasized ab o v e the NFL theorems mean that i f n a algorithm do es particularly w ell on

one class of problems then it most do more p o o rly o v er the remaining problems In particular

if an algorithm p erforms b etter than andom r earc s h o n some lass c of problems then in m ust

p erform worse than r andom se ar ch on the remaining problems Th us comparisons rep orting

the p erformance of a particular algorithm with particular parameter setting on a few sample

problems re a of limited utilit y While sicj results do indicate b eha vior on the n arro w ange r

of problems considered one hould s b e v ery w ary of trying to eneralize g those results to other

problems

Note though that the NFL heorem t need not b e view ed this w a y asaw a y of comparing

function classes F and F r classes of ev olution p o erators T T a s he t case migh t

b e It can b e view ed instead as a statemen t oncerning c a n y algorithm p erformance when

f is not ed under the uniform prior o v er cost functions P f jF j f w e w ish instead

to analyze p erformance where f is not xed as in this a lternativ ein terprations of the NFL

theorem but in on c trast with the NFL case f is no wc hosen from a nonniform prior then

w em ust analyze explicitly the sum

X

y y

P d j m a P d j f m a P f

m m

f

Since it is certainly true that an y class of problems faced b y a practitioner will not ha v e a t

prior what are the practical implications of the NFL theorems hen w view ed as a statemen t

concerning an algorithm p erformance for onxed n f This uestion q is tak en up in greater

detail in Section b ut w emak e a few commen ts here

First if the practitioner has kno wledge of problem c haracteristics but d o es n ot incorp o

rate them in to the optimization algorithm then P f s i ectiv ely uniform ecall that

and

ha ot

at

to tly eha

P f can b e view ed as a statemen t concerning the practitioner c hoice o f optimization l a

gorithms In suc h a case the NFL theorems establish that there are no ormal f assurances

that the algorithm c hosen will b e a t ll a ectiv e

Secondly while most classes of problems will certainly a h v e some structure whic h if

kno wn migh t b e exploitable the simple existence of that structure do es not justify c hoice of

a particular algorithm that structure m ust b e nok wn and rected d irectly in the c hoice of

algorithm to serv eas csu h a justiation In ther o w ords the s imple xistence e of structure

p er se absen t a sp eciation of that structure cannot p ro vide a b asis for p referring one al

gorithm o v er another F ormally t his is established b y the existence f o NFL yp e t heorems in

whic h rather than a v erage o v er sp eci cost functions f o ne a v erages o v er sp eci inds of

y

structure i theorems in whic hone a v erages P d j m a v er distributions P f That

m

suc h t heorems hold hen w o ne a v erages o v P f eans m that the indistinguishabilit yof

algorithms asso ciated with u niform P f is not some p athological outlier case Rather uni

form P f s ta ypical distribution as far s a ndistinguishabilit i y o f a lgorithms is concerned

The simple act f that the P f at and h s i n onniform cannot serv e to determine one c hoice

of optimization a lgorithm

Finally it is imp ortan t to emphasize that ev en if one i s considering the case where f is

not ed p erforming the asso ciated a v erage a ccording to a u niform P f s essen tial for

NFL to hold NFL can also b e demonstrated or f a r ange of nonniform priors F or example

Q

an y prior of the form P f x here P y f x is the istribution d of Y v alues

x

will also giv e N FL The f v erage can also enforce correlations b t e w een costs at diren t

X v alues and NFL still obtain F or example i f costs are rank ordered with ties brok en in

some arbitrary w a y and w e sum only o v er all cost f unctions g iv en b y p erm utations of t hose

orders then NFL still holds

The c hoice of uniform P f as motiv ated more from theoretical rather pragramattic

concerns as a w a y f o analyzing the theoretical structure of optimization Nev ertheless the

cautionary observ ations presen ted ab o v emak e clear that an analysis of the uniform P f

case has a n um b er of ramiations for p ractitioners

Sto c hastic optimization algorithms

Th us far w eha v e onsidered c the case in whic h a lgorithms are d eterministic What is the sit

uation for sto c hastic algorithms As i t turns out NFL results hold ev en for suc h algorithms

The pro f o of his t is straigh tforw ard Let be a ost c hastic non oten tially revisiting

algorithm F ormally his t means that is a apping m taking an y d to a d ep enden t istribu d

x

tion o v er X that equals zero for all x d In this sense is what in statistics comm uni t yis

x

kno wn as a h yp erarameter sp ecifying the unction f P d m j d for all m and

m

m

d One can no w repro duce the eriv d ation of the NFL result for d eterministic algorithms

only with a replaced b y throughout In so doing all steps in the pro of remain v alid his T

establishes that NFL results apply to sto c hastic algorithms as w ell as d eterministic ones

not

all er

A geometric p ersp ectiv e on the NFL theorems

In tuitiv ely he t NFL theorem illustrates that ev en if kno wledge of f erhaps sp cid e

through P f is not i ncorp orated in to a then there are n o formal a ssurances that a will

b e ectiv e Rather ectiv e optimization relies o n a fortuitous matc hing b e t w f and a

This p oin t is formally established b y viewing the NFL theorem from a eometric g p ersp ectiv e

Consider the space F of all p ossible cost functions As previously iscussed d n i regard o t

y

Equation he t probabilit y of obtaining some d is

m

X

y y

P d j m a P d j m a f P f

m m

f

where P f is the prior probabilit y that the optimization problem at hand has cost function

f This sum o v er functions can b e view ed as an inner pro duct in F M ore p recisely deing

y

y y

the F pace v ectors v p b y their f comp onen ts v f P d j m a f nd

d d

m m m

p f P f resp ectiv ely

y

y

P d j m a v p

d

m m

y

This equation pro vides a geometric i n terpretation of the optimization pro cess d can

m

b e view ed as ed to the sample that is desired sually u one with a lo w ost c v alue and m

is a measure of the computational resources that can b e arded An y kno wledge of the

prop erties of the cost unction f go es in to the p rior o v er cost functions p Then Equation

sa ys the p erformance of an algorithm s i determined b y the magnitude f o its pro jection

y

on to p i b yho w a ligned v is with the problems p Alternativ ya v eraging o v er

d

m

y y y

d itisesyta o ees atth E d j m a s i a n nner i pro duct b et w een p and E d j m a f The

m m m

y

exp ectation of an y p erformance measure d can b e written similarly

m

In an y of these c ases P fr p m ust atc h or b e aligned with a to get desired

beha vior This need for matc hing pro vides a new p ersp ectiv eon ho w certain algorithms can

p erform w ell in practice on sp eci kinds of problems F or example it means t hat the y ears

of researc hin to the tra v eling salesman problem TSP ha v e resulted in algorithms aligned

with the mplicit p describing tra v eling salesman problems of in terest to TSP researc hers

P

y

T aking the g eometric view the NFL result that P d j f m a i s i ndep enden tof a has

f

m

y

the in terpretation that f or an y articular p d m all algorithms a v e the same pro jection

m

y

on to the he t uniform P f represen b y the diagonal v ector F ormally v

d

m

y

y

cst d F or deterministic algorithms the comp onen ts of v i the robabilities p

d

m

m

y

that algorithm a giv es sample d on cost function f after m distinct cost ev aluations are

m

P

y y

all either or so NFL also implies that P d j m a f cst d Geometrically

f

m m

y

this indicates that the length of v is indep enden tof a Diren t algorithms th us

d

m

y

generate diren tv ectors v ha ving the same length and l ying on a one c w ith c onstan t

d

m

pro jection on to sc hematic of this situation is sho wn in Figure f or the c ase where

F is dimensional Because the comp onen ts of v are binary w emgih teivqu alen

c

y

view v as lying on the s ubset the v ertices of the Bo olean h yp ercub e ha ving the same

d

m

hamming distance from

tly

all

ted

ha and

ely

and

een1

p

Figure Sc hematic view of the situation i n whic h unction f space F is dimensional The

uniform prior o v er this space ies l long a the iagonal d Diren t algorithms a giv e diren t

v ectors v lying in the cone surrounding the diagonal A particular problem is represen ted b y

its prior p lying on the simplex The algorithm t hat ill w p erform b e st will b e the algorithm

in the cone ha ving the largest i nner pro duct with p

y

No w restrict atten tion to algorithms h a ving the same probabilit y o f ome s p articular d

m

The algorithms in this et s lie in the in tersection of conesne ab o ut the d iagonal set b y

y

the NFL theorem and one set b yha ving the ame s p robabilit yfor d T his s i in general an

m

y

jF j dimensional manifold Con tin uing s a w eimpose y et more d ased restrictions on a

m

set of algorithms w e will con tin ue to reduce the dimensionalit y of the manifold b y fo cusing

on in tersections of more and more cones

The geometric view of optimization also suggests lternativ a e easures m for determining

ho w imilar t w o ptimization o algorithms re a C onsider again E quation n I that he t

y

algorithm directly only giv es v p erhaps the most straigh torw ard w a y to compare t w o

d

m

y y

algorithms a and a w ould b e b y measuring ho w similar the v ectors v and v are

d d

m m

b yev aluating the dot pro duct of those v ectors H o w ev v ectors o ccur on the

righ tand side of Equation whereas the p erformance of the algorithms hic w hisatfer

all our ultimate concern i nstead o ccur on he t leftand side This suggests measuring

y

the similarit yof t w o lgorithms a not directly n i terms of their v ectors v but rather in

d

m

terms of the d ot pro ducts of those v ectors w ith p F or example it ma y b e the case that

algorithms b eha v ev ery similarly for certain P f but are quite diren t or f other P f In

man y resp ects kno wing this ab out t w o a lgorithms is of more in terest than kno wing ho w

y

their v ectors v compare

d

m

As another example of a similarit y easure m suggested b y the geometric p e rsp ectiv e

w e could measure similarit ybte w een algorithms based on similarities b t e w een P f s F or

example for t w o d iren t algorithms ne o can magine i solving for the P f that optimizes

those er

y

P d j m a or f those algorithms i n some onrivial n sense W e could then use some

m

measure of distance b et w een those t w o P f distributions a s a gauge of ho w similar the

asso ciated algorithms are

Unfortunately exploiting the inner pro duct form ula i n p ractice b y going from a P f

to an algorithm optimal for that P f app ears to often b e uite q diult Indeed v e en

determining a plausible P f f or the ituation s at hand is often diult Consider for

example TSP problems with N cities T o the degree that an y ractitioner p attac ks all

N it y TSP cost functions with the same algorithm that practitioner implicitly ignores

distinctions b et w een suc h cost functions In this that ractitioner p has i mplici tly agreed

that the problem is one of o h w their ed algorithm o d e s cross a the set of all N it yTSP

cost functions H o w ev er the etailed d nature of the P f that i s uniform o v er this class of

problems a pp ears o t b e iult d to elucidate

On the ther o hand here t is a g ro wing b o d y o f w ork that do es rely explicitly on en u

meration of P f F or example applications of Mark o v random elds ri KS to cost

landscap es yield P f directly a s a Gibbs distribution

Calculational applications of the NFL t heorems

In this section w e explore some of the applications of the NFL theorems for p erforming

calculations concerning optimization W e ill w consider b oth calculations of practical and

theoretical in terest and b egin with calculations of theoretical in terest in whic h nformation i

theoretic quan tities arise naturally

Informationheoretic a sp ects of optimization

F or exp ository purp oses w e implify s the d iscussion sligh b y c onsidering only the histogram

of n um b er o f instances of eac h p ossible cost v alue pro duced b y a run of an algorithm nd a

not the temp oral order in whic h those cost v alues w ere generated ssen tially all real

w orld p erformance measures are indep nden e tof suc h temp o ral i nformation W e indicate

that histogram with the sym bol c c has Y comp onen ts c where c is the

Y Y Y i

jY j

y

n um b er of imes t cost v alue Y o ccurs in the sample d

i

m

No w consider an y question lik e t wing What fraction of cost functions giv ea

particular histogram c of cost v alues after m distinct cost ev aluations p ro duced b y using a

particular instan tiation of an ev olutionary algorithm O W Hol

A t st glance this seems to b e an in tractable question Ho w ev er it turn out that the

NFL theorem pro vides a w a y to er it This is b ecause according to the NFL theorem

the answ er m ust b e indep enden t of the algorithm sed u to generate c onsequen C w e

can c hose an algorithm for whic h the calculation is tractable

In particular one ma yw t t o imp ose restrictions on P f F or instance one ma y ish w t o only consider

P f that are in v arian t under at least partial relab elling of the elemen ts in X to reclude p here t b eing an

algorithm that will assuredly uc k out and land on min f x n ist v ery rst query

x

an

tly

answ

follo he

tlyTheorem F or any a lgorithm the fr onoacti fc ost f unctions that r esult i n a p articular

histo gr am c m is

m m

jX j m

j

c c c c c c

jY j jY j

f

jX j m

j j

F or lar ge enough m this c an b e appr oximate das

exp mS

C m j

f

Q

jY j

i i

wher e S is the entr opy of the istribution d nd C m j a c onstant t hat do es not

dep end on

This theorem is eriv d ed in App endix C If some of the are the appro ximation still holds

i

only with Y redeed to exclude the y corresp onding to the zero alued o w ev Y

i

is deed the normalization constan t f o quation E can b e found b y summing o v er all

lying on the unit simplex

A question related to one addressed in this heorem t is the follo wing or a giv en cost

function what is the fraction of all lgorithms a that giv e rise to a particular c It turns

alg

out that the nly o feature of f relev t f or this q uestion is the h istogram of its cost v alues

formed b y lo o king across all X Sp ecify the fractional form of this histogram b y there

are N jX j poin ts in X for w hic h f x has the i h Y v alue

i i

In App endix D it is sho wn that to leading order dep ends on y et another

alg

information theoretic quan tit y the Kullbac kiebler distance T b t e w een and

Theorem F or a given f with histo gr am N jX j the fr action of algorithms that give

rise to a histo gr am c m is given by

Q

j

N

i

i

c

i

jX j

m

F or lar ge enough m this c an b e w ritten as

mD

KL

e

C m jX j jY j

alg Q

jY j

i i

wher e D is the Kul lb ackieb le r distnac eb etwe en the distributions and

KL

As b e fore C can b e calculated b y umming s o v er the u nit simplex

alg

jY

an

er

is jY

jY

jY jY

jY Measures of p erformance

W eno wsoh who w to pply a the NFL framew ork to calculate certain b enc hmark p erformance

measures These allo w b oth he t programmatic ather than a d o h c a ssessmen t of he t eacy

of an y individual optimization algorithm and principled comparisons b t e w een lgorithms a

Without loss f o generalit y a ssume that the goal of the s earc h p ro cess is ding a m ini

m um So w e are in terested in the ep endence of P in c j f m a b ywhic hw emean

the probabilit y that the minim um cost an algorithm a ds on problem f in m distinct

ev aluations is arger l than t l east three quan tities related to this conditional probabilit y

can b e used t o gauge an algorithm p erformance in a particular optimization run

i The uniform a v erage of P c j f m a v er all cost functions

ii The form P in c j f m a tak es for he t random a lgorithm whic h esnus oirfon

mation from the sample d

m

iii The fraction of algorithms whic h for a articular p f and m elsut in a c whose minim um

exceeds

These measures giv e b enc hmarks whic han y a lgorithm run on a particular cost function

should surpass if that algorithm is to b e considered as ha ving w ork w ell for that cost

function

Without loss of generalit y assume the that i h cost v alue i Y equals o i stco v alues

i

run from a minim um of o t a maxim um of jY j nin teger incremen ts T he follo wing results

are deriv ed in App endix E

Theorem

X

m

P in c j f m

f

wher e jY j is the r f action of c ost lying ab ove In the limit of jY j hist

distribution ob eys he t fol lowing r elationship

P

E in c j f m

f

j m

Unless one algorithm has its b estostoar drop faster than the drop asso ciated with

these results one w ould b e h ardressed indeed to claim that the lgorithm a is w elluited to

the cost function a t and h After all for uc s h p erformance the lgorithm a is doing no b etter

than one w ould exp ect t i t o for a randomly c hosen cost function

Unlik e he t preceding measure the measures analyzed b elo wtak ein to accoun t the actual

cost function at hand This is manifested in the dep endance o f the v alues of those measures

on the v ector N giv en b y the cost function histogram N jX j

jY

ed

in

Theorem F or the r andom algorithm a

m

Y

i jX j

P in c j f m a

i jX j

i

P

jY j

wher e N j is the fr action of p oints in X for which f x rsto or der

i

i

in jX j

m m

m

P in c j f m a

j

This result a llo ws the calculation of other quan tities of in terest for measuring p erformance

for example the quan y

j

X

E in c j f m a P in c j f m a P in c j f m a

Note that for man y cost functions f o b oth practical and t heoretical in terest cost v alues are

distributed Gaussianly or suc h ases c w e can use that Gaussian nature of the d istribution

to facilitate our calculations In particular i f the mean and v ariance of the G aussian are

p

and resp ectiv hen t w eha v e frce where e rfc is the complime n

error function

T o calculate the third p rformance e measure note that or f ed f m or an eydrte

ministic algorithm a P c j f m a s i either or Therefore the fraction f o algorithms

whic h result in a c whose minim um exceeds is giv en b y

P

P in c j f m a

a

P

a

P

Expanding in terms of c e can rewrite the n umerator f o this ratio s a P in c

c

P P

j c P c j f m a Ho w ev er the ratio of this quan yto i s exactly what w

a a

lated when w eev aluated measure ii see the b eginning of the a rgumen t eriving d quation E

This establishes the follo wing

Theorem F or e d f and m the fr action of algorithms which r esult in a c whose minimum

exc e e ds is given by the quantity on the rightand sides of quations E

As a p articular example of applying this result consider easuring m the v alue of in m c

pro duced in a articular p run of y our a lgorithm Then imagine that when i t s i ev aluated for

equal to this v alue he t quan tit ygvi en in Equation i s ess l than In suc h a situation

the algorithm i n q uestion has p erformaed w orse than o v er half of all searc h algorithms for

the f and m at hand hardly a stirring endorsemen t

None of the discussion ab o v e explicitly concerns t he dynamics of an algorithm p erfor

mance as m increases Man y sp a ects of s uc h dynamics a m ybe fo in terest As an example let

and

calcu as tit

and

tary ely

jY

tit

jX

jX

us consider whether as m ws there s i an yc hange in ho ww ell he t algorithm p erformance

compares to that of the random algorithm

T o this end let he t sample generated b y the algorithm a after m steps b e d and dee

m

y

y min d Let k b e the n um b er of additional steps it tak es the algorithm to d an

m

x suc h that f x o ww e can estimate the n um b er of steps it w ould ha v e tak en the

random searc h lgorithm a to searc h X d and nd a p oin t hose w y w as less than y he

X

x

exp ected v alue of this n um b er f o steps is d where z d s i he t fraction of X d

m

for whic h f x Therefore k d sho wm hw orse a did than w ould ha v ethe

random algorithm o n a v erage

Next imagine l etting a run for man y steps o v er some tness function f and p lotting ho w

w ell a did in comparison to the random algorithm on that run s a m increased Consider

the step where a ds its n h ew n v alue of min c F or that step t here is an asso ciated k

y

he n um b er of steps n u til the next min d and z d Accordingly indicate that step on

m

our plot as the p oin t n k d Put do wn as man y p oin ts on our plot as there are

successiv ev alues of min c d r a o v f

If throughout the run a is alw a ys a b etter m atc hto f than is the random searc h algorithm

then all the p o in ts in the p lot ill w ha v e heir t ordinate v alues ie l b elo w If the random

algorithm w y of he t comparisons though hat t w ould mean a p oin t lying ab o v e

In general ev en if the p oin ts all lie to ne o side of one w ould exp ect that as the searc h

progresses there is corresp onding erhaps systematic v ariation in ho w far a w a yfhrome

p oin ts lie That v ariation tells one when the algorithm s i en tering harder o r asier e parts of

the searc h

Note that ev en for a ed f y sing u diren t starting p oin ts for the algorithm one

could generate man y o f hese t plots and then sup erimp o se them This allo ws a plot of

the mean v alue of k d s a a function of n along ith w an sso a iated c rror e bar

Similarly one could replace the single n um ber z d haracterizing t he random algorithm

with a full distribution o v n um b r e of required steps to d a n ew minim In these

and similar w a ys one can generate a m ore n uanced picture f o a n algorithm p erformance

than is pro vided b yan y of the single n b ers giv en b y the p erformance measure discussed

ab o v e

Minimax distinctions b et w een algorithms

The NFL theorems do not direclt y ddress a minimax p rop erties of searc h F or example sa y

w ee considering t w o d eterministic algorithms a and a t ma yv ery w ell b e that there

exist cost functions f suc h t hat a histogram is m h b etter ccording to some a ppropriate

p erformance measure than a but no ost c functions for whic h the rev erse is true F or the

NFL theorem to b e ob ey ed in suc h a scenario it w ould ha v e to b e true that there are man y

more f for whic h a histogram is b etter than a than vice ersa but i t is only sligh tly

b etter for all t hose f or suc h a scenario i n a certain sense a has b etter headoead

minim ax b eha vior than a there are f for whic h a b eats a badly but none for whic h a

do es substan tially w orse than a

uc

um

um the er

an for on

er of un the in

uc

groF ormally esa y that there exists headoead minimax istinctions d b e t w een t w o algo

rithms a and a i there exists a k suc h that f or at least one cost function f the dirence

E c j f m a E c j f m a k but there is no other f for whic h E c j f m a E c j

y

f m a k A similar deition can b e u sed f i ne o is instead n i terested in cr d

m

rather than c

It app ears that analyzing headoead inimax m prop erties of algorithms is substan tially

more diult than analyzing a v erage b eha vior ik e in the NFL theorem Presen tly ery

little is kno wn ab out minimax b eha vior in v olving sto c hastic algorithms In particular it is

not kno wn if there are a n y senses in whic hastoc hastic v ersion of a deterministic algorithm

has b etter orse minim ax b eha vior than that eterministic d a lgorithm In fact ev en if w e

stic k completely to eterministic d algorithms only an extremely preliminary understanding

of minim ax issues has b een reac hed

What w edo nok w is the follo wing Consider the quan tit y

X

y y

P z z j f m a

d

m m

f

for deterministic algorithms a and a y P asmean t the distribution of a random

A

v ariable A ev aluated at A a F or deterministic a lgorithms this q uan tit y s i just he t

n um ber of f h that it is b oth true that a pro duces a p opulation with Y comp onen ts z

and that a pro duces a p opulation ith w Y comp onen ts z

In App endix F it is ro p v en b y example that this uan q tit y n eed not b e symmetric under

in terc hange of z and z

Theorem In gener al

X X

y y y y

P z z j f m a P z j f m a

d d

m m m m

f f

This means that under certain circumstances ev en kno wing only the Y comp onen ts of the

p opulations pro uced d b yt w o algorithms run on the s ame nkno wn f e can infer some

thing concerning what algorithm pro duced eac h p opulation

No w consider the quan tit y

X

P z z j f m a

C

f

again for deterministic algorithms a a This uan q y is just the n um ber of f suc h that

it is b oth true that a pro duces a histogram z and that a pro duces a istogram h zt too

need not b e symmetric under i n terc hange o f z z ee App endix F This is a tronger s

y

statemen t then the asymmetry of d statemen t since an y particular histogram corresp onds

to m ultiple p opulations

It w ould seem that neither of these t w o esults r d irectly mplies i that there are algorithms

a and a suc h that for some fa histogram i s m h b etter than a b no f is the

rev erse is true T oin v estigate this problem in v olv es lo oking o v er all p airs of h istograms ne

for ut uc

and

tit and

suc

pair for eac h f cu h t hat there is the same relationship b et w een the p erformances of he t

algorithms as rected in the histograms S imply ha ving an inequalit ybet w een the sums

presen ted ab o v e do es ot n seem to directly i mply that the relativ e p erformances b et w een

the asso ciated pair of istograms h is a symmetric T o formally e stablish this w ould in v olv e

creating scenarios in whic h there s i n a i nequalit ybet w een the sums but no eadoead h

minim ax distinctions Suc h an a nalysis i s b ey ond the scop e o f this p ap er

On the other hand ha ving the sums equal do es carry ob vious mplications i for hether w

there are headoead minim ax distinctions F or example f i b oth algorithms are determinis

y y

tic then for an y particular fP z j f m a e quals for one z pair and

d

m m

P

y y

for all others In suc h a case P z j f m a is ustj the n ber of f that re

d

f

m m

P P

y y y y

sult in the pair z So P z z j f m a P z j f m a

f d f d

m m m m

implies that there are no headoead minim ax distinctions b et w een a and a T he con v erse

do es not app ear to hold ho w ev er

As a preliminary analysis of whether there can b e headoead minim ax distinctions w e

can exploit the esult r in App endix F whic h concerns the case where jX j jY j First

y

dee the follo wing p erformance measures of t w olemen t p opulations Q d

i Q y Q y

ii Q y Q y

iii Q of an y ther o argumen t

In App endix F w esoh w that for t his scenario there exist pairs of algorithms a and a h

that for one fa generates the istogram h f y g and a generates the istogram h f y g

but there is no f for w hic h the rev erse o c curs there is no f suc hthta a generates the

histogram f y g a generates f y g

So in this scenario with our deed p erformance measure there ar e minimax distinc

tions b et w een a and a or one f the p erformance measures of algorithms a and a are

resp ectiv ely and The dirence in the Q v alues for the t w o lgorithms a i s for that f

Ho w ev er there are no other f for whic h the dirence is F or this Q then algorithm a is

minim ax sup erior to algorithm a

y

It is not curren tly kno wn what restrictions on Q d re a needed for there to b e minimax

m

y

distinctions b et w een the algorithms As an example t i ma yw lbel etath for Q d

m

y

min f d i g there are no minimax distinctions b et w een algorithms

i

m

More generally at p resen t nothing is kno wn ab out ho w ig b a problem t hese kinds of

asymmetries are All of the e xamples f o asymmetry considered here arise when the et s of

Consider t he grid of all z z airs p Assign t o eac h grid p o in tthe n um ber of f that result in that grid

p oin t z z air p Then our constrain i b yteh h yp othesis that there are no headoead minim ax

distinctions if grid p oin t z s i assigned a nonero n b e r then so is z and i i b y the noree

lunc h theorem the sum of all n um b ers in ro w z equals the sum of all n b ers in column z These t w o

constrain ts do not app ear to imply that the istribution d of n b ers is symmetric nder u in terc hange of ro ws

and columns Although again l ik e b efore to formally establish this p oin tw ould in v olv e explicitly creating

searc h scenarios in whic hithldos

um

um

um

are ts

and

suc

um

X v alues a has visited o v erlaps with those that a has isited v Giv en suc ho v erlap and

certain prop erties of ho w the algorithms generated the o v erlap asymmetry arises A precise

sp eciation of those ertain prop erties is not y et in hand N or is it kno wn ho w generic

they are i for what p ercen tage of pairs f o algorithms they arise lthough A suc h issues are

easy to state ee App endix F it s i ot n at all clear o h w b est to answ er them

Ho w ev er consider the c ase where w e are assured that n i m steps the p o pulations of t w o

particular algorithms ha v e not o v erlapp ed h assurances hold f or example if w e are

comparing t w o hilllim ng algorithms that start far a part n the scale of m n X It

turns out that giv en suc h ssurances a there are n o asymmetrie sbet w een the t w o algorithms

for m lemen t p opulations T o see this formally o g through the a rgumen t sed u to pro v e

P

y y

the NFL theorem but apply that argumen t to the quan tit y P z z j f m a

f d

m m

rather than P c j f m a Doing his t stablishes e the follo wing

x x

Theorem If there is no o v erlap b et w een d and d hen

m m

X X

y y y y

P z z j f m a P z j f m a

d d

m m m m

f f

An immedi ate consequence of this theorem s i that nder u the n o v erlap conditions the

P

quan tit y P z z j f m a i s s tric u nder in terc hange of z and z a s are

C

f

all distributions etermined d from this one o v er C C the distribution o v er the

dirence b et w een t hose C extrema

x

Note that with sto c hastic algorithms f i they giv e nonero probabilit yto lal d there

m

is alw a o v erlap to onsider c So there is alw a ys the p ossibilit y of asymmetry b et w een

algorithms if one o f them s i sto c hastic

P f ndep enden t esults r

All w ork to his t p oin t has largely considered the b eha vior of v arious algorithms across a w ide

range of problems In this section w ein tro duce the k inds of results that can b e obtained

when w e rev erse roles and consider the prop erties of an m y algorithms o n a single problem

More results f o this t yp e are found in W The results of this section although less

sw eeping than the NFL results hold no matter w hat t he real w orld distribution o v er cost

functions is

Let a and a be t w o searc h lgorithms a Dee a c ho osing ro p cedure as a rule that

examines the samples d and d pro duced b y a a resp ectiv ely nd a based on t hose

m

m

p opulations ecides d to use either a or a for the subsequen t art p of the searc h As an

example one ational c ho osing pro cedure is to use a for the subsequen t part o f the searc h

if and only it as h generated a lo w er cost v alue in its sample than has a Con v ersely w e

can consider a irrational c ho osing pro cedure that w en t w ith t he algorithm that had not

generated the s ample with the lo w est cost solution

A t t he p oin t hat t a c ho osing pro cedure tak es ect the cost function will ha v ebeen

sampled at d d d A ccordingly f d refers to the samples of the cost function that

m

m

and

ys

and

ymme

bi

Succome after using the c ho osing algorithm then the ser u i s in terested in the remaining sample

d s alw a ys without loss of generalit y t i i s assumed that the searc h algorithm c hosen

b y the c ho osing pro cedure do es not return to an ypino ts in d

The follo wing t heorem ro p v en in App endix G establishes that there i s o n a priori

justiation for using an y p articular c ho osing p ro cedure o L o sely sp eaking o n matter what

the cost function without sp ecial consideration of the algorithm at and h simply observing

ho ww ell that algorithm has done so far tells us nothing a priori ab out o h ww it w

if w e con tin ue to use it on the same cost function F or simplicit y i n stating the result w e

only consider deterministic algorithms

Theorem L et d d b e t wo e d s amples of size m t hat ar egneer ate d henw het

m

m

algorithms a and a r esp e ctively ar e run on the rbitr ary c ost function at hand L et A and

B b e two dir ent cho osing pr o c e dur L et k b e the numb er of elements in c

X X

P c j f d d P c j f d d

a a

Implicit in this result is the assumption that the sum excludes those algorithms a and a

that do not result in d and d resp ectiv ely when run on f

In the precise form i t s i resen p b a o v e the result ma y a pp ear misleading since it

treats all p opulations equally w f ygiv en f some p opulations will b e more lik

than others Ho w ev er ev en if one w ts p o pulations a ccording to their probabilit yof

o ccurrence it s i still true that on a v erage the c ho osing p ro cedure ne o uses has o n ect on

lik ely c This is e stablished b y the follo wing result p ro v en in App endix H

Theorem Under the c onditions given in the pr e c e ding the or em

X X

P c j f m k a a P c j

a a

These results sho w t hat o n assumption for P f alone justis using some c ho osing

pro cedure as far as subsequen t searc h s i concerned T oha v ean ni telligen tc ho osing pro cedure

one m ust tak ein to accoun t ot n only P f b ut also the searc h lgorithms a one is c ho osing

among This conclusion ma y b e surprising I n articular p note that it means that there is no

in trinsic adv an tage to using a r ational c ho osing ro p cedure whic hcon tin ues with the b etter

of a and a rather than using a irrational c ho osing pro cedure whic h d o es the opp osite

These results also ha v ein teresting implications for degenerate c ho osing p ro cedures A

f alw a ys use algorithm a g and B alw a ys use algorithm a g As applied to this case they

a can kno wto a v oid the elemen it has seen b fore e Ho w er a priori a has n o w a yto a v oid t he elemen ts

it hasn seen y et but that a has nd vice ersa Rather than ha v e the deition of a someho w d

on the elemen d d nd similarly for a w e d eal w ith this problem b y d eing c to b e set only b y

those elemen d that lie outside f o d This is similar to the con v en tion w e xploited e ab o v e to d eal

with p oten tially retracing algorithms F ormally this means that the r andom v ariable c is a function of

d as w ell as of d It a lso means there ma ybe few er elemen ts in the h istogram c than there are i n the

p opulation d

in ts

in ts

end ep

ev ts

eigh

ely an or hen

ted

Then es

and

do ould ell

mean that for ed f and f f f do es b etter n a v erage ith w the algorithms in some set

A then f do es b etter n a v erage with the algorithms in the set of all other algorithms

In particular if f or some fa v orite algorithms a c ertain ell eha v ed f results in b etter

p erformance than do es the random f then that w ell eha v ed f es worse than r andom

beha vior on the et s all remaining algorithms In this ense s just as there are no univ ersally

eacious searc h algorithms there are n o univ ersally b enign f whic h can b e assured of

resulting in b tter e than random p erformance regardless of one algorithm

In fact things ma yv ery w ell b e w orse than this In sup ervised learning there is a

related result ola T ranslated in to the curren tcno text that result uggests s that if one

restricts our s ums to only b e o v er those algorithms that are a g o o d matc hto P f then it is

often the case thattupid c ho osing pro c edures ik l e the irrational pro edure c o f c ho osing

the algorithm with the less desirable c o utp erform n telligen t ones What the set of

algorithms summed o v er m ust b e for a rational c ho osing pro cedure to b e s up erior to an

irrational is not urren c tly kno wn

Conclusions

A framew ork has b een presen ted in whic h to compare generalurp ose optimization algo

rithms A n b er f o NFL theorems w ere deriv ed that d emonstrate the danger of comparing

algorithms b y their p erformance on a small sample of problems These same results also in

dicate the imp ortance of incorp orating problemp eci kno wledge in to the b ha e vior of the

algorithm A geometric in terpretation w as giv en sho wing hat w t i means for a n lgorithm a to

be w elluited to solving a certain class of problems The geometric p ersp ectiv e a lso uggests s

an um b er of measures to compare the similarit yof v arious optimization algorithms

More direct calculational applications of the N FL theorem w ere demonstrated b yin v es

tigating certain information theoretic sp a ects of searc h w ell as b ydev eloping a n um ber

of b enc hmark measures of algorithm p erformance These b enc hmark measures should pro v e

useful in practice

W e pro vided an analysis of the w a ys that algorithms can dir a priori despite the

NFL theorems W eha v e also pro vided an i n tro duction to a v arian t o f t he framew ork that

fo cuses on the b eha vior of a range of algorithms on sp eci problems ather than sp eci

algorithms o v er a range of problems This v arian t eads l directly to reconsideration of man y

issues addressed b y computational complexit y a s etailed d in W

Muc h future w ork clearly remains the reader is directed to M or f a l ist of some

of it Most imp ortan t i s the dev elopmen t f o ractical p applications of these ideas Can the ge

ometric viewp oin t b e u sed to construct n ew optimization tec hniques n i ractice p W e b eliev e

the answ er to b e y es A t a minim um s Mrak o v andom r ld o m d els o f andscap l es b ecome

more widepread the a pproac hem b o died in this pap r e should nd wider applicabilit y

Ac kno wledgmen ts

W ew ould lik e t o thank Ra ja Das Da vid F ogel T al Grossman P aul H elman Bennett Lev

itan Unaa y O ielly and the review ers for helpful commen ts and suggestions W GM

thanks the San ta F e Institute for funding and DHW t hanks the San ta F e Institute and TXN

as

um

giv

Inc for supp ort

References

T T M Co v er and J A Thomas Elements of information the ory John Wiley

Sons New Y ork

O W L J F ogel A J Ow ens and M J W alsh A rtiial Intel ligenc e thr ough Simulate d

Evolution Wiley ew Y ork

lo F Glo v er ORSA J Comput

lo F Glo v er ORSA J Comput

ri D Griath In tro uction d o t random elds S pringer erlag New Y ork

ol J H Holland A daptation in Natur al and A rtiial Systems MIT Press Cam

bridge MA

GV S Kirkpatric k D C Gelatt and M P ecc hi O ptimization b y sim ulated an

nealing Scienc e

S R Kinderman and J L nell S Markov r andom lds and their applic ations merA

ican Mathematical o S ciet y ro vidence

W E L La wler nd a D E W ood Op er ations R ar ch

W W G Macready and D H W olp ert What mak es an optimization problem

Complexity

M D H W olp rt e W G Macready No free lunc h the

orems for searc h T ec hnical Rep ort SFIR ftp

f tpantaf eduubhw tpf l ear ch R s an ta F e Institute

f

ola D H W olp ert The lac k o f a prior distinctions b et w een learning algorithms and the

existence of a p riori distinctions b et w een earning l algorithms Neur al Computation

olb D H W olp ert On bias plus v ariance Neural omputation C in ress p

A NFL pro of for tatic s ost c functions

P

W e sho w that P c j f m a h as no dep endence on a Conceptually the pro of is quite

f

simple but necessary b o ok eeping complicates things l engthening he t pro of considerably

The in tuition b hind e the p ro of is quite simple though b y umming s o v er all f w e ensure that

and

ese

the past p erformance of an algorithm has no b earing on its f uture p erformance Accordingly

under suc h a sum all algorithms p erform equally

The pro f o is b y nduction i The nduction i is based on m and the inductiv e step is

x x

based on breaking f in to t w o ndep i enden t p arts one f or x d and o ne for x d These

m m

are ev aluated separately g iving the d esired result

x x x

F or m w e write the sample as d f d d g where d b y a The only

y

x

p ossible v alue for d is f d so w eha v e

X X

y y x

P d j f m d d

f f

where is the Kronec k er delta function

y

x

Summing o v er all p ossible cost functions d d is nly o or f those functions whic h

y

x j x

ha v e cost d at p oin t d Therefore that sum equals jY j i ndep enden tof d

X

y

j

P d j f m jY j

f

whic h is indep enden tof a his T bases the induction

P

y y

The inductiv e step requires that if P d j f m a i s ndep i enden tof a for all d enh

f m m

P

y

so also is P d j f m Establishing this step completes the p ro of

f m

W e b egin b y writing

y y y y

P d j f m P f d m g m j f m

m m m m

y

y

P d m j f m

m

m

y y

P d m j d P d j f m

m

m m

and th us

X X

y y y y

P d j f m P d m j d P d j f m

m m m m

f f

y

The new y v alue d m will dep end on the new x v alue f n e w e

m

expand o v er these p ossible x v alues obtaining

X X

y y

y y

P d j f m P d m j f x P x j d P d j f m

m m

m m

f fx

X

y

y y

d m x P x j d P d j f m

m m m

fx

x y

Next note that since x a d it do es not d ep end directly on f onsequen C w e

m m

x y

expand in d to remo v e the f dep endence in P x j d

m m

X X

y y

x y

P d j f m d m x P x j d P d j d

m

m m m m

x

f d fx

m

y

P d j f m

m

X

y

d m a d P d j f m a

m m

m

x

fd

m

tly

So lse othing and

jX

jX

set is where use w as made of the fact that P x j d x a d and the fact that P d j f m

m m m

P d j f m a

m

The sum o v er cost functions f is done st The cost function is deed b oth o v er those

x x

p oin ts restricted to d and t hose p in o ts outside of d P d j f m a will dep end on the f

m

m m

y

x

v alues deed o v p o ts inside d while d m a d dep ends only on the f

m

m m

x x x

v alues deed o v er p oin ts outside d Recall t hat a d d So w eha v e

m m m

X X X X

y y

P d j f m P d j f m a d m a d

m m

m m

x

x x

f d f x d f x d

m m m

P

j m

The sum x con tributes a constan t jY j equal to the n um b r e of functions

f x d

m

x x

deed o v er p in o d passing through d m a d So

m

m m

X X

y

jX j m

P d j f m jY j P d j f m a

m

m

x x

f f x d

m m

X

P d j f m a

m

jY j

x

fd

m

X

y

P d j f m a

m

jY j

f

By h yp othesis the righ t hand side of this equation is indep enden tof a so the left hand side

m ust also b e T his completes the pro of

B NFL pro of for timeep enden t cost functions

In analogy with the pro of of the static NFL theorem the pro of for the timeep enden t case

P

pro ceeds b y establishing the a ndep endence of the sum P c j f T m a where here c is

T

y y

either d or D

m m

T o b egin replace eac h T in this sum with a set f o cost unctions f f ne o for eac h iteration

i

of the algorithm T o o d his t w e s tart with the follo wing

X X X X

x x

P c j f T m a P c j f d P f f j f

m

m m

x

T T d f f

m

m

X X X

x x

P c j f d P d j f m a P f f j f

m

m m

x

d f f T

m

m

where the sequence of cost functions f as h b een indicated b ythe v ector f f

i m

In the next step the sum o v er all p ossible T is decomp o sed in to a series of sums Eac h sum

in the series is o v er the v alues T can tak e for one p articular iteration f o the algorithm More

formally sing u f T f w e write

i i i

X X X

x x

P c j f T m a P c j f d P d j f ma

m m

x

T d f f

m

m

X X

f f f T T f

m m m

T T

m

in not ts

jX

in er

P

Note that P c j f T m a i s i ndep enden tof the v alues o f T so those v alues can b e

i

T

absorb ed in to an o v erall a ndep enden t p rop ortionalit ycannsto t

Consider the innermost sum o v er T for ed v alues f o the outer sum indices T T

m m

F or ed v alues of the outer indices T T T f is just a particular ed cost func

m m

tion Accordingly the innermost sum o v er T is simply the n um b r e of ijections b of F that

m

map that ed cost function to f This i s the constan t jF j Consequen tly v aluating

m

the T sum yields

m

X X X

x x

P c j f T m a P c j f d P d j f m a

m m

x

T d f f

m

m

X X

f f f T T f

m m m

T T

m

The sum o v er T can b e accomplished in the same manner T is summed o v er In fact

m m

all the sums o v er all T can b e done lea

i

X X X

y x x

P c j f T m a P D j f d P d j f m a

m m m

x

T d f f

m

m

X X

x x

P c j f d P d j f f

m

m m

x

d f f

m

m

In this last step the statistical indep endence of c f has b een used

m

y y

F urther progress dep e nds on whether c represen ts d or D e b egin with analysis of

m m

y x y x y

the D case F or this case P c j f d P D j f since D only rects cost v alues

m

m m m m m

from the last cost function f sing U this result giv es

m

X X X X

y x y x

P D j f T m a P d j f f P D j f

m m

m m m m

x

T d f f f

m m

m

The al sum o v er f is a constan t equal to the n um ber of w a ys of generating the sample

m

y

D from cost v alues dra wn from f The i mp ortan tpoin t i s hat t it is indep enden tof

m

m

x x

the particular d Because of this the sum o v er d can b e ev aluated eliminating the a

m m

dep endence

X X X

y x

P D j f T m a P d j f f

m

m m

x

T f f d

m

m

y

This completes the pro of of Theorem for the case of D

m

y

The pro of of Theorem is completed b y turning to the d case his T s i considerably

m

x

more diult since P c j f d can not b e simplid so that the ums s o v f can not b e

i

m

decoupled Nev ertheless the NFL result still holds This is ro p v en b y e xpanding quation E

y

o v er p ssible o d v alues

m

X X X X

y y y y x x

P d j f T m a P d j d P d j f d P d j f f

m

m m m m m m

y

x

T d f f

m d

m m

m

X X X Y

y y x y x

P d j d P d j f f d i d i

m i

m m m m m

y

x

d f f i

d m

m m

er

and

ving

y x

The innermost sum o v er f only has an ct e on the d i d i term so it con tributes

m i

m m

P

y x jX j

d m d m This is a constan t equal to jY j ihs lea v

f m

m m m

m

X X X X Y

y y y x y x

P d j f T m a P d j d P d j f f d i d i

m i

m m m m m m

y

x

T d f f i

d m

m

m

x

The sum o v er d m sno w s imple

m

X X X X X

y y y x

P d j f T m a P d j d P d j f f

m

m m m m

y

x x

T d d m f f

d m

m m m

m

Y

y x

d i d i

i

m m

i

The ab o v e equation is o f the same form as Equation only w ith a remaining p opulation

of size m rather than m Consequen tly n i an analogous manner to the sc heme used to

x

ev aluate the s ums o v er f and d m that existed in Equation the sums o v er f and

m m

m

x

d m can b e ev aluated Doing so simply generates more a ndep enden t rop p ortionalit y

m

constan ts Con tin uing in this manner all sums o v f can b e ev aluated to nd

i

X X X

y x y x

P c j f T m a P c j d P d j m a d d

m m m m

y

x

T

d

d

m m

There is algorithmep endence in this result ut b it is the trivial dep endence d iscussed pre

x

viously It arises from ho w the algorithm selects the st x p oin t in its p opulation d

m

Restricting in terest to those p oin ts in the sample that are enerated g subsequen t to the st

this result sho ws that there are no distinctions b et w een algorithms Alternativ summing

o v er the initial cost function f all p in o ts in the s ample could b e onsidered c while still

retaining an N FL result

C Pro of of result

f

As noted in the discussion leading up to heorem T the fraction of functions giving a sp ecid

histogram c m is indep enden t of the algorithm Consequen tly a s imple algorithm is

used to pro v e the theorem The lgorithm a visits p o in X in some canonical order

sa y x Recall that the istogram h c is sp ecid b y giving t he frequencies of

m

o ccurrence cross a the x for eac h o f the jY j p ssible o cost v alues The n um ber

m

of f giving the desired histogram under this algorithm is just the m ultinom ial giving the

n um ber of w a ys of distributing the ost c v alues n i c t the remaining jX j m p oin ts in X

the cost can assume an yof eth jY j f v alues iving g the st result of Theorem

The expression of in terms of the en trop yof follo ws from an application of

f

Stirling appro ximation to order O whic his v alid when all o f the c are arge l In this

i

in ts

ely

the er

es

case the m ultinomi al is written

jY j jY j

X X

m

ln m ln m c ln c ln m c

i i i

c c c

jY j

i i

jY j

X

mS Yj ln m

i

i

from whic h the theorem ollo f b y exp onen tiating this result

D Pro of of result

al g

In this section the prop ortion of all lgorithms a that g iv e a particular c for a particular f is

calculated The calculation pro c eeds n i sev eral steps

Since X is ite there are nite n um b er of diren t samples herefore T an y determinis

tic a isah uge but ite list indexed b y all p ossible d Eac hen try in t he list is the x

the a in question outputs for that d ndex

Consider an y particular unordered set of m X Y iars ewhrenot w o of he t pairs share

the same x v alue Suc h a set is called n a unordered path Without loss of generalit y f rom

no won w e implicitl y restrict the discussion to unordered paths f o length m A particular

is in or from a particular f if there is a unordered set of m x f x pairs iden tical to

The n umerator on the righ tand side of Equation s i the n b er f o unordered paths in

the giv en f that giv e the desired c

The n b er of unordered paths in f that giv e the desired c he n umerator on the

righ tand side of Equation s i rop p ortional to the n um ber of a that iv g e the desired

c for f and the pro of f o this claim constitutes a pro of of Equation F urthermore he t

prop ortionalit y constan t s i indep enden tof f and c

Pro of he T pro of is established b y constructing a apping m a taking in an a that

giv es the desired c for f nd a pro ucing d a f and g iv es the esired d c Sho

that for an y the n um b er f o lgorithms a a suc hthat a isacnsotan t indep enden tof

f and dn c tath is single alued will complete the ro p of

Recalling that that ev ery x v alue in an unordered path is distinct a n y u

giv es a set of m diren t ordered paths Eac h suc h ordered path in turn pro vides a s et

or d

of m successiv e d f the empt y d is included and a follo wing x ndicate I b y d his

or d

set of the st md pro vided b y

or d

rom an y o rdered ath p a partial algorithm can b e constructed This consists

or d

of the list of an a but with only the md n tries in the list lled in the remaining

d

en tries are blank Since there are m istinct d partial a for eac h ne for ac e h rdered o path

corresp onding o t there re a m suc h p artially ledn lists for eac h A partial algorithm

ma yor am y not b e consisten t w ith a articular p ull f algorithm This a llo ws the deition

of the in v erse of for an y that is in f and g iv es c he set of all a that are

consisten t w ith at least one partial algorithm generated from and t hat giv e c when run on

f

or

path nordered

wing in is that

um

um

ws

ln

lnT o complete the st part of the pro of i t m b e s wn that for all that a re in f and

giv e c con tains the same n um b er of elemen ts regardless of f r c o that end

st generate all ordered aths p induced b y and then a sso ciate eac h suc h ordered path with

a distinct m lemen t p artial algorithm No who wman y f ull algorithms l ists are consisten t

with at least one of these partial algorithm artial p lists Ho w t his q uestion is answ ered is

the core of this app ndix e T o nsw a er this q uestion reorder the en tries in eac h f o the partial

algorithm lists b y p erm uting t he indices d of all the lists Ob viously suc h a reordering w on

c hange the answ er to our uestion q

Reordering is a ccomplished b yin terc hanging airs p of d indices First n i terc hange an y

x y x y

d index of the orm f d d i m i m whose en try is led in

m m m m

x x

in an y of our partial algorithm lists with d d d d i where z is

m m

some arbitrary constan t Y v alue and x refers to the j h elemen tof X ext N create some

j

arbitrary but xed ordering of all x x Then in terc hange n a y d index of

j

x x

the form d d i m whose en try is lled in in an y o f our ew partial

m m

x

algorithm lists with d d x x Recall that a ll the d i ust b e distinct

m

m

By construction the resultan t partial algorithm lists are ndep i enden tof c and f sis hte

n um ber of ucs h lists t m Therefore the n um b er o f lgorithms a consisten t ith w at l east

one partial a lgorithm list in s i i ndep enden tof c and f T his completes the st

part of the pro of

F or the second part st c ho ose an y unordered paths that ir d from one another A

and B There is no rdered o path A constructed from A that equals an ordered path B

or d or d

constructed from Bo c ho ose an y suc h A an ysuc h B If they disagree or f the

or d or d

n ull d then w ekno w hat t there is no eterministic a that agrees with b oth of them If

they agree for the n ull d then since they are sampled from the same f hyhe a v e the same

singlelemen t d If they disagree for that d then there is no a that agrees with b oth of

them If they gree a for that d then they ha v e the same d oublelemen t dno tin ue in this

manner all the up to the m lemen t d S ince the t w o rdered o paths d ir they m ust

ha v e d isagreed at some p o in tb yno w and therefore there is o n a that agrees with b oth of

them Since this is true or f an y A from A and a n y B from B e see that there is no a

d d

in A that is also in B This completes the ro p of

T osho w the relation to the Kullbac kiebler distance the pro duct of binomials is ex

panded with the aid of Stirlings appro ximation when b o th N and c are large

i i

jY j j

Y X

N

i

ln ln N ln N c ln c N c n N c

i i i i i i i i

c

i

i i

N ln N c ln c

i i i i

W e it h as b een assumed that c N whic h s i reasonable w hen m Xj Expanding

i i

ln z z z to second order giv es

j jY j

Y X

N N c

i i i

ln c ln c c c

i i i i

c c N

i i i

i i

ln ln

jY

ln

jY

or or

and

jX

ho ust

Using m jX j then in terms of one nds

jY j

Y

N m j

i

ln mD m m ln ln

KL

c j

i

i

jY j

X

m

i

m m

i i

j

i

i

P

where D ln is the Kullbac kiebler istance d b et w een the distributions

KL i i i

i

and xp E onen tiating this expression yields the second r esult n i Theorem

E Benc hmark measures of p erformance

The result for ac e h b enc hmark measure is established i n turn

P

y

The st measure is P in d j f m a Consider

f m

X

y

P in d j f m a

m

f

for whic h t he summand equals or for all f and eterministic d a It s i only if

x y

i f d d

m m

y

ii f a d d

m

m

y

iii f a d d

m m

m

and so on These restrictions w ill the v alue of f x t m poin ts while f remains free at

all other p o in ts herefore T

X

y jX j m

P d j f m a j

m

f

Using this result in Equation w end

X X X

y y y

P in d j f m P in d j d

m m m

m m

j jY j

y y y

f

d d min d

m m m

m

j

m

j

whic h is t he result quoted in Theorem

P P

jY j

m m

In the limit as jY j gets large write E in c j f m and

f

P

jY j

substitute in for jY j Replacing with turns the sum in to

m m

Next write jY j b for some b and m ultiply and divide he t

Yj j

summand b y S ince jY j then T o tak eteh mtlii of apply Lopital

jY

jY

jY

jY

jY

jX

ln

jX

jY

andrule to the ratio in the summand Next use the fact that is going to to cancel terms

in the summand Carrying through the algebra and d ividing b y b w e get a Riemann

R

b

m m

sum of the form dx x x v aluating the in tegral giv es the econd s result in

b

Theorem

The second b enc hmark concerns t he b eha vior of the random algorithm Marginalizing

o v er the Y v alues o f diren t h istograms c the p e rformance f o a is

X

P in c j f m a P in c j c P c j f m a

c

No w P c j f m a s i the probabilit y f o obtaining histogram c m random dra ws from the

histogram N of the function f his T can b e view ed as the eition d o f a This robabilit p y

Q

jY j

N j

i

has b een calculated previously as o

i

c m

i

jY j jY j

m m

X X X Y

N

i

P in c j f m a c P in c j c

i

jX j

c

i

c c

i i

j

m

j jY j

m m

X X X Y

N

i

c

i

jX j

c

i

c c i i

jY j

m

P

jY j

N jX j

i

i

m m

jX j j

m m

whic h is E quation of Theorem

F Pro of related to minimax distinctions b et w een algo

rithms

The pro of is b y example

Consider three p oin ts in X x nd x nd a three p o in Y y nd y

Let the st p oin t a visits b e x and the rst p o in t a visits b e x

If at its st p oin t a sees a y a y i t jumps to x Otherwise it jumps to x

If at its st p oin t a sees a y tjusmp ot x fiI tssaee y t i j umps to x

Consider the cost function that has as t he Y v alues for the three X v alues f y g

resp ectiv ely

F or m a will pro duce a p opulation y or f this function and a will pro duce

y

The pro of is c ompleted if w e sho w that t here is no cost function so that a pro duces a

p opulation con taining y and y and suc h that a pro duces a p opulation con taining y and

y

There are four p ossible pairs of p opulations to consider

or

in ts

jX

jY

jY

jX

in

i y y

ii y y

iii y y

iv y y

Since if its st p o in tisa y a jumps to x whic h s i here w a starts when a st p oin tis

a y its second p o in tm ust equal a st p o in t This rules o ut p ossibilities i and ii

F or p ssibilities o iii and iv b y a p o pulation w ekno wtaht f m ust b e of the form

f y g orf some v ariable s or case iii s w ould need to equal y due to the st p oin t

in a p opulation Ho w ev er for that c ase the second p oin t a sees w ould b e the v at x

whic his y con trary to h yp othesis

F or case iv w ekon wtttha he s w ould ha v e t o equal y due to the rst p in o tin a

p opulation Ho w er that w ould mean hat t a jumps to x for its second p oin t and w ould

therefore see a y con trary to h yp othesis

Accordingly n one of the four cases is p ossible This is a case b oth here w there is no

y

symmetry under exc hange f o d b t e w een a and a and no symmetry under e xc hange of

histograms QED

G Fixed cost functions and c ho osing pro edures c

Since an y deterministic searc h algorithm is a m apping from d to x n y h

D

algorithm is a v ector in the space X The comp onen ts of suc hav ector re a indexed b ythe

p ossible p o pulations and the v alue for eac h comp onen t s i the x that the algorithm pro duces

giv en the asso ciated p opulation

Consider no w a particular p o pulation d of size m Giv en d e an c sa y w hether an y

other p opulation of size greater than m has the rdered elemen d as its st m r

dered elemen ts The set of hose t p opulations that d o start with d this w a nesay de etos f

comp onen ts of an y algorithm v ector a Those comp onen ts will b e indicated b y a

d

The remaining comp nen o ts of a are of t w ot yp es The st is giv en b y those p opulations

that are equiv alen t to he t st M m elemen ts in d for some M The v alues of t hose

comp onen ts for the v ector algorithm a will b e indicated b y a T he second t yp e consists of

d

those comp onen ts corresp nding o o t all remaining p opulations In tuitiv ely these are p opu

lations that are not compatible w ith d S ome examples of suc h p opulations re a p opulations

that con tain as one of their rst m elemen ts an elemen t ot n found in d and p pulations o that

rerder the elemen ts found n i dhe v alues of a for comp o nen ts of this second t yp e will b e

indicated b y a

d

Let pr o c b e either A or B eaer in terested in

X X X X

P c j f d roc P c j f d d roc

a a a a

d d d

d d d

of ts

searc

ev

alue

The summand is indep enden t of the v alues of a and a for either of o ur t w o d In

d

d

addition the n um ber fo scu hv alues i s a constan t t s i ivg en b y the pro duct o v er all

p opulations not consisten t ith w df eth n b er f o p ossible x eac h suc h p opulation could

b e mapp ed to Therefore up to an o v erall constan t indep enden tof d d f and pr o che

sum equals

X X

P c j f d d roc

d d

d d

a a

d d

d d

By deition w e are implicitl y restricting the sum to those a and a so that our summand

is deed This means that w e ctually a only allo woen v alue for eac h comp nen o tin a

d

amely he t v alue that giv es the next x elemen tin d and similarly for a Therefore he t

d

sum reduces to

X

P c j f d d roc

d

d

a

d

d

x

Note that no c omp onen tof a lies in d The same is true o f a o eth sum o v er a is

d d

d

o v er the same comp onen ts of a as the sum o v er a a o wforxed d and d pr o c

d

c hoice of a a is ed Accordingly without loss of generalit y he t sum can b e rewritten

as

X

P c j f d d

d

a

d

with the implicit assumption that c is set b y a his T sum is indep e nden tof pr o c

d

H Pro o f of heorem T

Let pr o c refer to a c ho osing pro cedure W e are in terested in

X X

P c j f m k a a rco P c j f d d roc

a a

P d d j f k m a a roc

The sum o v er d and d can b e mo v ed outside the sum o v er a and a onsider C an y term in that

sum i n y particular pair of v alues of d and d F or that term P d d j f k m a a roc

is just for those a and a that result n i d and d resp ectiv ely when run on f and

otherwise ecall the assumption that a a are eterministic d This m eans that the

P d d j f k m a a rco f actor simply restricts our sum o v er a and a to the a and a

considered in our theorem Accordingly o ur theorem tell s u hat t the summand f o the sum

o v er d and d is the ame s for c ho osing pro cedures A and B Therefore the full sum is the

same for b th o pro cedures

and

or

of is

um

## Comments 0

Log in to post a comment