Solving symbolic regression problems with uniform design-aided gene expression programming

jinksimaginaryΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

127 εμφανίσεις

J Supercomput
DOI 10.1007/s11227-013-0943-6
Solving symbolic regression problems with uniform
design-aided gene expression programming
Yunliang Chen
·
Dan Chen
·
Samee U.Khan
·
Jianzhong Huang
·
Changsheng Xie
©Springer Science+Business Media New York 2013
Abstract
Gene Expression Programming (GEP) significantly surpasses traditional
evolutionary approaches to solving symbolic regression problems.However,exist-
ing GEP algorithms still suffer from premature convergence and slow evolution
in anaphase.Aiming at these pitfalls,we designed a novel evolutionary algorithm,
namely UniformDesign-Aided Gene Expression Programming (UGEP).UGEP uses
(1) a mixed-level uniform table for generating initial population and (2) multiparent
crossover operators by taking advantages of the dispersibility of uniform design.In
addition to a theoretic analysis,we compared UGEP to existing GEP variants via
a number of experiments in dealing with symbolic regression problems including
function fitting and chaotic time series prediction.Experimental results indicate that
UGEP excels in terms of both the capability of achieving the global optimumand the
convergence speed in solving symbolic regression problems.
Keywords
Gene expression programming
·
Uniformdesign
·
Symbolic regression
problem
·
Function fitting
·
Time series prediction
Y.Chen
·
J.Huang
·
C.Xie
School of Computer Science,Huazhong University of Science &Technology,Wuhan,Hubei
430074,China
Y.Chen
·
D.Chen (
￿
)
School of Computer Science,China University of Geosciences,Wuhan 430074,China
e-mail:
chendan@pmail.ntu.edu.sg
S.U.Khan (
￿
)
Department of Electrical and Computer Engineering,North Dakota State University,Fargo,ND,
USA
e-mail:
samee.khan@ndsu.edu
Y.Chen et al.
1 Introduction
Evolutionary algorithms have been widely adopted in handling symbolic regression
problems in modern sciences and engineering such as function fitting and time series
prediction.To solve a symbolic regression problem,we normally need to establish a
mathematical expression,which fits a number of discrete data points.The goal is to
minimize the errors between the values computed with the expression and the actual
values of the data points.
Existing Genetic Algorithms (GAs) and Genetic Programming (GP) meth-
ods [1,2] have achieved many successes in dealing with these problems [3–5].In
GAs,individuals are expressed as linear strings with fixed length (chromosome)
through all evolution steps.This makes GAs not applicable to fitting some very
complex functions [6],while in GP methods,individuals are expressed as nonlin-
ear objects with different sizes and shapes.A GP method is normally capable of
characterizing very complex functions,but the variety in object sizes often hampers
the evolutionary procedure to achieve the optimal solution [7].
In contrast,Gene Expression Programming (GEP) is a salient approach in creating
computer programs denoting the learned models and/or discovered knowledge [8,9].
GEP is similar to Genetic Algorithms (GAs) and Genetic Programming (GP),and
it differs from these evolutionary approaches mainly in chromosome encoding.GEP
encodes individuals as chromosomes and implement themas linear stings with fixed
lengths [9,10].The separation of genotype and phenotype has endowed GEP with
more flexibility and power of exploring the entire search space.The chromosomes of
GEP are simple and linear.It can be operated by the genetic process easily,and it has
the capability to handle complex problems [7,11–16].
GEP exhibits significant advantages over its counterparts:for example,compar-
ing to GAs,GEP performs almost 2–4 orders of magnitude faster in solving general
problems because of its special individual expression [6,9].GEP can find a brand
new function,which is much better than GAs do [6,9].Zuo et al.presented a typical
symbolic regression example in which some subjects may be satisfied with a formula,
y =ax
3
+bx
2
+cx +d,in which the coefficients a,b,c,and d can be found by GA;
while a much better and more complex formula,y =sin(ax
3.5
+bx
2.5
+dx),can be
found using GEP [10].
The original GEP algorithm begins with randomly generating linear fixed chro-
mosomes for individuals within the initial population [9].Each individual is judged
by a fitness function for each evolution generation.The individuals are then reserved
by fitness values to reproduce the modification.The new individuals are subjected to
the same process.The evolution process will continue until it reaches a prespecified
number of generations or a solution is found.The original GEP still suffers frompre-
mature convergence and slow evolution in anaphase (e.g.,evolution in a long period
of time from the 300
th
generation to the 500
th
generation as the pre-specified termi-
nation condition) [7,12,17].Attempts have been made to overcome these pitfalls by
redesigning some operations within the evolution process,such as forming solutions,
individual initialization according to specific problemand sampling fromparents for
propagation,and the method of crossover operators [8,18–20].For example,the hy-
brid GEP parallel algorithmintroduced in [20] achieves a higher stability and search
ability by combining simulated annealing and genetic mechanism.
Solving symbolic regression problems
Although these methods have some successes,there is a critical point ignored,
which we consider the key to solving the symbolic regression problems with GEP:
whether the elements of initial population can properly represent the whole element
sets.Without a proper scheme,some key elements may be lost when the size of the
initial population is small;some key elements may be repeated,and the others may
still be lost when the size is large.
In GEP,keeping the diversity of chromosome plays an important role in the evo-
lution process.Initialization of the population is the premise in the evolution process
and the quality of the population will affect the diversity of chromosome for each
evolution generation.If the elements of the initial population are sampled uniformly
from element sets,the key elements will not be missed.In this study,we proposed a
novel Gene Expression Programming aided by uniformdesign (referred to as UGEP)
for initializing the population.According to the properties of uniform design,it not
only makes each sampling more representative but also decreases the number of ex-
periment times comparing to using orthogonal design [21,22].Given mfactors each
having n levels,when exhaustively performing the experiments,there are n
m
experi-
ments to be executed in total;when using orthogonal design,the number is n
2
;when
using uniformdesign,the number dramatically drops to n.
In GEP,the individuals of the initial population are generated randomly from the
elements set.In contrast,UGEP adopts uniform tables generated from the elements
set to initialize the population,and this makes the individuals well distributed.In
addition,UGEP uses adaptive multi-parent crossover operators as genetic operators,
which can play an important part in evolution process.
Function fitting and chaotic time series (e.g.,sunspots) prediction problems have
been tested to check whether a UGEP algorithm is capable of solving symbolic re-
gression problems.In the function fitting test,an optimal parameter setting has been
obtained and a performance comparison has been made between UGEP and the origi-
nal GEP.In experiments on real data sets for chaotic time series (sunspots) prediction,
two methods based on UGEP are applied,i.e.,the slide window prediction method
and differential equation prediction method.Results show the proposed algorithm is
efficient in making prediction on chaotic time series.
The remainder of this paper is organized as follows:Sect.2 recaps the existing
GEP algorithms and motivates this study.Section 3 presents the UGEP algorithmfor
symbolic regression problems,which covers (1) the construction method of initial
population based on mixed-level uniformtable and (2) the adaptive crossover uniform
operator.In Sect.4,we analyzed the capability to achieve global convergence and the
convergence speed of UGEP in theory.Section 5 presents the experiments and results
of using UGEP to dealing with function fitting and chaotic time series prediction.We
concluded the paper and present the future work in Sect.6.
2 Basics of gene expression programming
This section recaps the basics of Gene Expression Programming (GEP).GEP is a
powerful evolutionary method derived fromGenetic Programming (GP) to overcome
the common drawbacks of GA and GP [9].Similar to GA and GP,GEP follows the
Y.Chen et al.
Fig.1 An example of
chromosome (single gene) and
its decoding in GEP
Darwinian principle of the survival of the fittest and uses populations of candidate
solutions to a given problemin order to evolve newones.The difference among GEP,
GA,and GP is the way in which individuals of a population of solutions are repre-
sented [9].Although GEP has a simple and linear form,it is flexible and powerful in
solving complex problems [6].
In GEP,an individual (chromosome) is represented by a genotype,constituted by
one or more genes.Achromosome is a linear and compact entity,which can be easily
manipulated with genetic operators such as mutation,crossover,and transposition.
When using GEP to solve a problem,there are five components that should be
specified:the function set,the terminal set,the fitness function,GEP control param-
eters,and the stop condition.
Generation of the initial population of solutions is the first step.This can be done
by using a random process.The individuals are then expressed as expression trees
(ETs,an example is given in Fig.1),which can be evaluated according to a fitness
function that determines howgood a solution is in the problemdomain.According to
the value of each chromosome evaluated by the fitness function,the operator on the
selected chromosomes will be applied such as crossover,mutation,and rotation.If a
solution of satisfactory quality is found,or a predetermined number of generations
are reached,the evolution stops and the “best-so-far” solution is returned.
2.1 Chromosome encoding
Each chromosome is a character string in fixed-length,which can be composed of
any element fromthe function set or the terminal set.Each gene has a head and a tail.
The size of the head (h) is defined by the user,but the size of the tail (t ) is obtained as
a function of h and a parameter n (the number of elements of the function set).The
tail size can be calculated by the following equation:
t =h

(n −1) +1 (1)
Each gene is referred to as a Karva notation or K-expression and can be mapped
into an expression tree (ET).In the case of multigenic chromosomes,all ETs are con-
nected by their root node using a linking function such as Boolean function,function
“+”,etc.Functions,terminals,and constants are allowed in the heads,whereas only
terminals or constants in the tails.For example,the ET shown in Fig.1 corresponds
Solving symbolic regression problems
Fig.2 A three-gene
chromosome and its sub-ETs
Table 1 Statistic result of the initial population
Element
Gene
0 1 2 3 4 5 6 7 8 9
+ 11 7 4 3 8 12 0 0 0 0
− 13 4 9 6 7 5 0 0 0 0
* 8 10 9 8 8 6 0 0 0 0
/8 12 8 8 16 9 0 0 0 0
Q 2 4 7 7 5 6 0 0 0 0
E 6 10 7 11 5 3 0 0 0 0
S 2 12 5 9 7 12 0 0 0 0
T 2 3 5 8 9 11 0 0 0 0
C 1 10 6 6 1 2 0 0 0 0
A 7 5 5 7 7 10 17 18 19 20
B 9 5 8 7 6 7 16 23 21 20
C 8 7 5 8 9 4 22 17 26 16
D 9 7 11 8 9 7 19 24 22 18
E 14 4 11 4 3 6 26 18 12 26
to a sample chromosome,and can be interpreted in a mathematical form.The ET
shown in Fig.2 is a multigenic chromosome.It is constructed by three genes and
can be connected by a function.As shown in Fig.2,the first gene is constructed by
thirteen elements,i.e.,“Q +aaa/babbaba.” The highlighted substring “babbaba” rep-
resents the tail of the gene while the substring “Q + aaa/” is the head.According to
the rules of ET,the elements “a/babbaba” are useless.
2.2 Population initialization
The population initialization is the first step of evolution procedure.The quality of
the chromosomes of the first generation plays an important role in the convergence
process.The initial population needs to have as many different individuals as possible
in order to efficiently explore the search space in further generations [19].The origi-
nal GEP generates the initial population at random.For instance,Table 1 presents a
statistic result of the initial population of a GEP programintroduced in [23].
In this GEP program,the number of population (p) is 100;the head (h) is 6;
function set ={+,−,∗,/,Q,E,S,T,C};terminal set ={a,b,c,d,e}.From Table 1,
Y.Chen et al.
we can see that generating the initial population randomly cannot improve the diver-
sity of chromosomes.For instance,function “/”appears 16 times in the No.4 gene
while function “C” appears only once.Especially when the number of the initial
population is small,some key elements may be lost.Even the number is big,some
key elements may be repeated,and the others may be yet lost.We need an effective
method to ensure all key elements can be generated in the first generation.In order
to keep the diversity of chromosomes,it is necessary to make sure that samples are
well distributed.In [21],a uniform design method is proposed for experimenting in
industry.The method uses a uniform table to keep samples taken from the solution
space well distributed.In GEP,we may also adopt a uniform table to achieve this
when generating the initial population at the first step of the evolution procedure.
2.3 Genetic operators and selection method
GEP uses genetic operators,i.e.,mutation,transposition,and crossover,to create vari-
ations for evolution.Amutation operator introduces a randomchange into symbols at
any position in a chromosome [9].A transposition operator transports the sequence
elements of gene to another place [17].The crossover operator chooses and pairs
two chromosomes to exchange some elements between them[19].Howto efficiently
create variation depends on the nature of the complex problemunder investigation.
Generally,after applying genetic operators to create variation in each generation,
GEP selects some individuals and copies those into the next generation based on their
fitness,such as simple elitism [7] and cloning of the best individual.Typically,the
roulette-wheel method [9] is used in many GA [1] and GP algorithms [24].
3 Uniformdesign genetic expression programming
Although GEP has showed its superiority compared to its counterparts,it has suffered
from pitfalls such as high probability of premature convergence and slow evolution
speed in evolution anaphase.This means that while solving symbolic regression prob-
lems the final results may fall into local optimum or otherwise it takes an extremely
long time to find the global optimal solution.The diversity of chromosomes is a key
factor in the GEP evolution procedure,and well-distributed samples are the premise
to keep the diversity of chromosomes.To ensure that the key elements can be in-
volved on initialization,we designed a newGEP algorithmbased on uniformdesign,
namely UGEP,to tackle the pitfalls of the original GEP centering on population ini-
tialization.In addition,UGEP uses the uniform optimization method (see Sect.3.3)
instead of stochastic evolution.
3.1 The flow of the UGEP algorithm
A UGEP process can be separated into several parts including:(1) population ini-
tialization;(2) genetic operation,selection and reserving;(3) revealing the global
solution.The UGEP algorithmflow can be illustrated in Fig.3.
Solving symbolic regression problems
Fig.3 The UGEP algorithm
flow
Block 1:Set the parameters:chromosome’s head length;the probabilities of three
evolutionary operators (multiple cross-hybridization,gene recombination,Dc do-
main recombination);function set and terminal set;
Block 2:Initialize the population P:generate the uniform tables from the elements
of function set and terminal set,then initialize the individuals according to the
uniformtables (see Algorithm1);
Block 3:According to the probabilities of multiple cross-hybridization,use the adap-
tive multi-parent crossover operator to get offspring (see Algorithm 2),and then
apply gene recombination and Dc domain recombination for all offspring.Rank
all offspring according to their fitness value;
Block 4:Adopt elitismstrategy for all individuals:keep the current optimal solution
and eliminate the worst individuals of the parent population at a preset ratio;
Block 5:If the global optimal solution is found or the preset maximum number of
generations is reached,end the evolution process.Otherwise,go to Step 3.
3.2 Initializing population upon a mixed-level uniformtable
The uniform design [21] is one of the space filling designs and has been widely
used in experimenting.Its main objective is to sample a small set of elements from
Y.Chen et al.
Fig.4 The algorithmof
construction of the mixed-level
uniformtable
a given set,such that the sampled elements are uniformly scattered and maintain
the characteristics of the whole set.A uniform table can be expressed by a matrix
U
M
(Q
S
),where S is the factors and Q is the levels,M is the selecting samples of
combinations fromthe whole space Q
S
[25].
In UGEP,the initial population can be generated in accordance with the uniform
table.The matrix is filled with the elements of chromosomes or genes,and each
row represents an individual.The mixed-level uniform table can be constructed by
Algorithm1,which is shown in Fig.4.
Algorithm1 (Construct the mixed-level uniformtable for initializing the population)
Block 1:This step set the parameters of chromosome and gene:the length of head
h,function set f(x),the number of elements f,terminal set var(x),the number
of elements v,the length of tail t =h ∗ (n −1) +1,the number of polygene g,
the number of chromosomes n∗(f +v +1),a constant integer n.Then give each
element in f(x) and var(x) a tag number as the primer members of the matrix;
Block 2:Construct a matrix U
n∗k
((n ∗ k)
h+t
) and fill the elements using generation-
vectors method [18].Here h+t is the number of the columns,n∗k is the number
Solving symbolic regression problems
of the rows and also the levels of each factors,k ∈(v +1,f +v +1) is generated
randomly;
Block 3:In GEP,the number of the levels of the head is not equal to that of the tail.
This step adjust the uniformtable U
n∗k
((n∗k)
h+t
) [21],separate the columns into
two classes (the elements of the head and the tail) by the formulary 2 as follows:
u

ij
=
￿
u
ij
mod (f +v +1),{i ∈(1,n ×k),j ∈(1,h)},u
ij
←u

ij
u
ij
mod (v +1),{i ∈(1,n ×k),j ∈(h +1,h +t)},u
ij
←u

ij
(2)
Then the uniformtable U
n∗k
((n ∗ k)
h
• (v +1)
t
) is constructed.
Block 4:Let U
n∗k
((n∗k)
h
•(v+1)
t
) be the basic table.The matrix U
n∗(f+v+1−k)
(v+
1)
h+t
is the empty table.Then circularly fill the empty table with the elements of
the basic table (see fractional-addition in [21]).We can get the mixed-level uni-
formtable for one gene under processing U
n∗(f+v+1)
((f +v +1)
h
• (v +1)
t
);
Block 5:If g =1,then go to step 6.Otherwise,update k,g =g −1,go to step 2;
Block 6:The matrix U
n∗(f+v+1)
(g • (f + v + 1)
h
• (v + 1)
t
) can be given by g
matrices using direct-product method [21].
In UGEP,each factor of the head has more levels to choose than the tail does.
In Algorithm 1,after constructing the matrix U
n∗k
((n ∗ k)
h+t
),UGEP adjusts the
matrix conforming to formulary 2 to balance the number of levels of the head and the
number of levels of the tail.Assume that a chromosome is constructed by g genes;
the mixed-level uniformtable can be obtained fromg matrices,which are referred to
when initializing the population with each row representing an individual.
3.3 Adaptive crossover operator
Genetic operation is the strategy applied in evolution procedure for finding the global
optimal solution.We use a crossover operator based on a multi-parent method which
is also empowered by uniform design.The crossover operator is designed using a
uniformoptimization method (see Algorithm2) instead of stochastic evolution.
Given m individuals (i.e.,chromosomes) in a generation during evolution,each
chromosome is divided into n exclusive genes.A uniform table is designed here to
sample n genes fromthose to forman offspring.
The crossover operator is adaptive.The scale of hybridization is controlled by par-
ents’ current state:if the distance amongst the parents’ fitness values f
p
and the cur-
rent best fitness values f
max
becomes larger,it can enable communications amongst
more parents thus to increase the chances for accommodating excellent gene seg-
ments by constructing a uniform table of a larger scale for hybridization.If the dis-
tance becomes shorter,it may avoid more excessive mutations from excellent gene
segments to have a uniformtable of a smaller scale.The number of parents in uniform
table can be determined as follows:
m
i
=
￿
￿
￿
￿
￿
1 −
f
p
f
max

￿
×m
i−1
￿
￿
￿
￿
(3)
where δ ∈(0,1),the parameter i is the current generation.The algorithmis illustrated
in Fig.5.
Y.Chen et al.
Fig.5 The algorithmof the
adaptive multi-parent crossover
operation
Algorithm2 (Adaptive multiparent crossover operation)
Block 1:Divide randomly the chromosome (L) into n disjoint subsets L
i
(i ∈(1,n)),
where L
i
denotes a subset,L
i
∩L
j
=Φ(i =j),
￿
n
i=1
L
i
=L;select parents for
hybridizing and add theminto a “matching pool”;
Block 2:Calculate the number of parents (m) of the current generation using formu-
lary (3),and randomly select m−1 parents fromthe population and add theminto
the matching pool;
Block 3:Construct the crossover uniform table U
m
(m
n
).Each row presents a
new offspring.Calculate the fitness value of offspring G
i
(L
1
,L
2
,...,L
n
),i ∈
1,2,...,m;
Block 4:Reserve the chromosomes as offspring generated by hybridization between
parents,which have the best fitness value among offspring G
i
(L
1
,L
2
,...,L
n
).
4 A theoretical analysis on UGEP
In this section,we analyze a UGEP algorithm’s performance of global convergence
and the computational complexity.The algorithm’s population initialization is imple-
Solving symbolic regression problems
mented using the mixed-level uniformtable and the multiparent crossover operator is
adopted.
4.1 Global convergence
Definition 1 Let f

=max(f(x)) be the global maximumin the search space S and
the set of the global optimal solutions is defined as M(x

) ={x ∈S|f(x) =f

}.
Definition 2 The ε-field set of the optimal solutions is defined as ∀ε >0;M
(x −ε)
=
{x ∈ S|f(x) ≥f

−ε},where m(M
(x−ε)
) >0 and the function m(A) represents the
Lebesgue measure of A [21].
Theorem 1 For the UGEP algorithm,after a limited number of generations,the
population P eventually covers the set M
(x−ε)
with the probability of convergence
p >0.
Proof Let us assume a subspace V
m
,which is formed by randomly selecting m
parents from the population P.After applying the multiparent crossover opera-
tor,their offspring obeys the uniform distribution in the space V
m
.If the condi-
tion M
(x−ε)
⊆ V
m
holds,the probability of finding M
(x−ε)
in the next generation
is
s
μ
• ε >0,where μ =|M
(x−ε)
∩ V
m
| and s represents the number of population
in V
m
.
Since the individuals are uniformly distributed in V
m
,if M
(x−ε)
⊂V
m
,then the
probability we have an individual i = {i ∈ P|i/∈ V
m
}) is C
N−m
N
• μ

,where μ

=
|V
m
∩S|.Therefore,a new m-dimension subspace V

m
(V

m
=V
m
) can be generated
by two randomly selected individuals i and j,where i/∈V
m
and j ∈P.
If M
(x−ε)
⊆V

m
,we have p >0.Otherwise,a new space V

m
will be generated.
After a limited number of generations for extending the new subspace,based on the
concave associativity,the probability that the population covers M
(x−ε)
can be for-
malized as
p =min
￿
s
μ
• ε,
N
￿
i=m
C
N−i
N
• μ

￿
>0.
￿
Theorem 2 The UGEP algorithm converges to the global optimum solution set
M(x

) with the probability of 1.In other words,∀ω,we always have lim
k→∞
P(|f


f
k
(x)| <ω) =1,where f
k
(x) represents the optimum solution in the population at
the kth generation.
Proof According to Theorem1,the probability of generating an individual i satisfy-
ing i ∈ M
(x−ε)
at the kth generation is p.Otherwise,the failure probability for the
optimal solution is p
k
≤1 −p,where p
k
represents the probability to generate the
offspring i (i/∈M
(x−ε)
) at the kth generation.
Since the elite replacement strategy is applied in the UGEP algorithm,we have the
inequality P(|f

−f
k
(x)| <ω) ≤1 −p{(i/∈ M
(x−ε)
) at the kth generation} ≤1 −
(1 −p)
k
holds for any ω.Furthermore,according to the theorem of infinite product,
we have lim
k→∞
P(|f

−f
k
(x)| <ω) =1 −
￿

k=1
(1 −p)
k
=1,which means the
UGEP algorithmconverges to the optimal solution with a probability of 1.￿
Y.Chen et al.
4.2 Convergence speed
Let H

be the template in the population of the kth generation,the set Generation
(H

,k) is the template set from the population Generation(k),where k denotes the
kth generation.N(H

,k) is the number of individuals that hold the template H

,that
is,N(H

,k) =Generation(H

,k).The number of individuals that hold the template
H

in Generation(k +1) [26] can be expected from the formula:E(H

,k +1) =
f(H

,k) ×N(H

,k).
f(H

,k) =
￿
N(H

,k)
g=1
fitness(g)
N(H

,k)
￿
g∈|Generation(k)|
g=1
fitness(g)
|Generation(k)|
(4)
Furthermore,let M be the set of the individuals involved in genetic operations.
According to formulary (4),the following relationship holds with the original GEP:
f
O
(H

,k) =
|Generation(k)|
N(H

,k)
×
￿
g∈{Generation(H

,k)\M}
g=1
fitness(g) +
￿
g∈M
g=1
fitness
O
(g)
￿
g∈{Generation(k)\M}
g=1
fitness(g) +
￿
g∈M
g=1
fitness
O
(g)
(5)
similarly,for UGEP the following relationship holds:
f
U
(H

,k) =
|Generation(k)|
N(H

,k)
×
￿
g∈{Generation(H

,k)\M}
g=1
fitness(g) +
￿
g∈M
g=1
fitness
U
(g)
￿
g∈{Generation(k)\M}
g=1
fitness(g) +
￿
g∈M
g=1
fitness
U
(g)
(6)
As concluded in [27],the optimal results generated from the set of uniform tests
are better than the
N
/(N +1) solutions obtained in exhaustive tests,where N is the
number of uniformtests.For example,giving a uniformtable U
8
(2
7
),the optimal so-
lutions fromeight uniformtests are not worse than the top fifteen (2
7
∗1/9) solutions
of 2
7
tests.In other words,under the same condition,the individuals generated from
the set of uniform test are better than that of the random test as a whole.In addition,
due to the property of the elite replacement strategy,the sum of individuals’ fitness
values in uniformset is always larger:
g∈M
￿
g=1
fitness
U
(g) ≥
g∈M
￿
g=1
fitness
O
(g) (7)
Furthermore,we have
g∈{Generation(H

,k)\M}
￿
g=1
fitness(g) ≤
g∈{Generation(k)\M}
￿
g=1
fitness(g) (8)
Solving symbolic regression problems
According to formulas (6),(7),and (8),we can have the formulary (9):
￿
f
O
(H

,k) ≤f
U
(H

,k)
E
O
(H

,k +1) ≤E
U
(H

,k +1)
(9)
where E
O
and E
U
denotes the expected value of f
O
and f
U
in the (k +1)th gener-
ation.
In summary,the number of optimal offspring at the subsequent generation in
UGEP increases more quickly than in the original GEP.The UGEP algorithm tends
to explore better solutions than the original GEP algorithmdoes.
5 Performance evaluation
Two types of symbolic regression experiments have been performed to evaluate the
performance of UGEP.We first explored the optimal parameter settings for the UGEP
algorithm through the study of a function fitting problem.After that,we compared
the performance of UGEP on addressing the function fitting problemwith the original
GEP.We used UGEP with the optimal parameters in an application for prediction of
Sun Spot Time Series with two alternative prediction methods.All experiments were
executed over a desktop computer with configurations:CPU (AMD AM2 Athlon 64
X2 5000,2.6 GHz);RAM (2 GB),Operating System (Windows XP Professional,
Service Pack2).
5.1 Parameter setting in a function fitting problem
The key parameters of the UGEP algorithminclude:the head length (H),the number
of genes (M),the probability of gene recombination (Pr),the population scale (S),
evolution generation (G),the rate of multi-parent crossover (Pc),and the probabil-
ity of Dc domain recombination (Pd).Using the method introduced in [28,29],we
treat the seven parameters as seven factors to construct the uniform table U
10
(5
7
),
which consists of 10 rows each representing a set of parameters (see the first eight
columns of Table 2).The parameters were specified with empirical values as sug-
gested in [9,10,20].
In this set of experiments,we selected an experimental function conforming to
expression y =10
(1−e
(−0.38×x)
×cos(0.7×x))
,where x is a variable of float type in [0,6].
R-square is calculated to evaluate the fitting accuracy by comparing the fitting func-
tion to the experimental function.A larger R-square represents a higher degree of
model fitting.We repeated 100 independent trials of experiments for each parameter
setting.Table 2 presents the averaged experimental results and their standard devia-
tion for each combination of parameters.
The experimental results indicate that the UGEP algorithm achieves a very high
accuracy with the 7
th
parameter setting.The averaged value of R-Square is 0.995039
and the best value of R-Square is 0.998652.The standard deviation of R-Square for
each parameter combination is less than 0.006,which indicates the UGEP algorithm
constantly performs well in dealing with these function fitting problems.
Y.Chen et al.
Table 2 The experimental results of function fitting using UGEP
H M Pr S G Pc Pd R-Square Standard
deviation
1 6 2 0.16 40 200 1.0 0.05 0.925626 0.004378
2 6 2 0.04 10 500 0.6 0.05 0.900774 0.002158
3 2 1 0.20 20 300 0.6 0.25 0.781383 0.005264
4 8 4 0.08 10 100 0.8 0.20 0.915682 0.003829
5 4 3 0.08 50 400 1.0 0.20 0.893454 0.001617
6 4 4 0.16 20 200 0.2 0.10 0.959686 0.003374
7 8 3 0.12 30 500 0.2 0.25 0.995039 0.002418
8 10 1 0.12 50 100 0.4 0.15 0.820848 0.001901
9 10 5 0.20 30 400 0.8 0.10 0.987813 0.004519
10 2 5 0.04 40 300 0.4 0.15 0.965160 0.003827
Although there are a number of factors that contribute to the performance of the
UGEP,we trust that the number of genes (M) has a significant impact on the evolu-
tion process through the experiments.As indicated in Table 2,the results of the 100
trials with the 3
rd
parameter settings (M=1) all fall into a local value 0.781383 with-
out exception;the averaged result of the experiments with the 8
th
parameter settings
(M=1) is also low compared to the others.It can be observed that in general better
results can be obtained when the value M increases.However,the best result we have
is with the 7
th
parameter settings (M=3).
It can also be observed that the results with the 7
th
and 9
th
parameter settings are
the best when the scale of the population S is 30.This means that in UGEP,a suitable
scale of the population is needed to achieve the optimal solution.In the original GEP,
usually a large scale of population is needed to extend the searching space.In UGEP,
we trust that the adaptive crossover operator can enable communications among more
parents thus to increase the chances for accommodating excellent gene segments.
From above,we can get the range of empirical parameter settings suitable for the
UGEP algorithmas follows:
The head length H [4,8],the number of genes M [3,5],the probability of gene
recombination Pr [0.1,0.2],the population scale S [30,40],evolution generation G
[400,500],the rate of multiparent crossover Pc [0.1,0.3],and the probability of Dc
domain recombination Pd [0.05,0.1].
We eventually identified a set of parameters:H(6),M(5),Pr(0.16),S(30),
G(500),Pc(0.23),and Pd(0.75).After executing 100 trials with this parameter set-
ting,the averaged value of R-Square is 0.998333,the best value is 0.999795,and the
standard deviation is 0.002368.The resulted function is as the follows:
Y

=
￿￿
cos(x)sqrt
￿
abs(0.713614)
￿￿

￿
cos(x)x
￿￿
+cos
￿
sqrt
￿
abs
￿
log
￿
abs
￿
10ˆ
￿
cos
￿
cos(0.755058)
￿￿￿￿￿￿￿
+x
+cos
￿￿
sqrt
￿
abs
￿￿
0.198157 ∗ 0.518540
￿￿￿
+sin
￿
cos(x)
￿￿￿
+x;
(R-Square =0.999795)
Solving symbolic regression problems
Fig.6 The fitness convergence curves of UGEP vs.the original GEP
The final parameter setting has been applied in the subsequent experiments.To
evaluate the performance of UGEP,the original GEP has been used for the same
function fitting problem.After 100 independent trials of experiments parameter set-
ting,an optimal parameter setting is found for OGEP (H(6),M(5),Pr(0.20),S(50),
G(500),Pc(0.23),and Pd(0.75)).Each experiment with the optimal parameter set-
ting has been repeated for ten times and the averaged results are presented in Fig.6.
The best fitness value at each generation is recorded.The two convergence curves
denote how these averaged values change with the generations for both UGEP and
the original GEP (referred to as OGEP).
Figure 6 shows that at the early stage (generations 0–50),the best fitness values of
both evolutionary processes increase quickly and UGEP can always achieve higher
best fitness value.Fromgeneration 50 onward,the superiority of UGEP to the original
GEP becomes more and more significant (from 16 at the 50
th
generation to 41 at the
500
th
generation).
The original GEP has an averaged R-Square value of 0.984667 and the best value
is 0.986652 among ten runs.The resulted fitness function is as the follows:
Y =sin(x) +
￿￿
sin
￿
sin
￿
(0.046175)
￿￿
+
￿
(0.056429) +x
￿￿
∗ x
￿
+tan
￿￿
cos
￿
cos
￿
(0.479415)
￿￿
∗ sin
￿
sqrt
￿
abs(x)
￿￿￿￿
+
￿￿￿
s ∗ (0.628590)
￿

￿
x

0.868160
￿￿
∗ sin
￿
(0.868160 −x)
￿￿
+cos
￿
cos
￿
cos
￿￿￿
x/(0.253029)
￿
/(−0.217811)
￿￿￿￿
;(R-Square =0.986652)
Table 3 presents the overall execution times of UGEP and the original GEP and
highlights the times for population initialization.Although it takes a longer time for
UGEP to initialize the population than the original GEP does,UGEP significantly
excels in terms of runtime performance at the price of negligible overhead incurred
by the construction of the uniformtable.
Y.Chen et al.
Table 3 The average time of
population initialization and the
whole execute time of UGEP
and GEP
Time UGEP The original GEP
Population initialization (sec) 0.048 0.025
Overall execution time (sec) 18.32 26.47
5.2 Sun spot predication via slide window prediction method (SWPM)
The Sun Spot Time Series prediction is a classic benchmark to evaluate algorithms
for chaotic time series prediction [30,31].We used the real sun spot data in order to
have an in-depth examination on the UGEP algorithm’s performance.
The first round of experiments used a Slide Window Prediction Method (SWPM)
[32].The method aims to predict an element’s value in a time series based on the
history,which operates in this manner:Given the length of a sliding window (h,i.e.,
the dimension) and the values of the elements of a time series (x
i
,1 ≤i ≤n) covered
by the window,the method finds a function f,such that for any m(n −h +1 ≤m≤
n),to predict the value of x
m
:
_x
m
=f(x
m−h
,x
m−h+1
,...,x
m−2
,x
m−1
),(h <m≤n) (10)
Obviously,the difference between the _x
m
and x
m
should be as small as possible
to ensure accuracy.
In the experiments of time series prediction,we compared UGEP with SA-MGEP
and OGEP.The same parameter setting described in Sect.5.1 applies to OGEP.The
test data (Wolfer Sun Spot Time Series) and the parameter setting for SA-MGEP are
available in [20].The time series contains 100 data elements.Assume dimension is
n,we can use (100 −n) data sets in the experiments.Amongst the (100 −n) data
sets,the first half and the second half are used for training and evaluation purposes
respectively.
The configurations of the experiments are:dimension as 6,10,and 12;delay is
1;function set F ={+,−,∗,/,ˆ,sin,cos,exp,ln} (xˆn means x
n
,0 <n <5);ter-
minal set T = {a,?} (?denotes a random constant,a is an independent variable).
For each configuration,100 trials of experiments were executed.The average results
(R-Square,execution time,and success rate) are shown in Fig.5.
As shown in Fig.7,UGEP performs better than SA-MGEP and the original GEP
do in chaotic time prediction using SWPMunder all configurations.With the increas-
ing dimension of the sliding window,the differences amongst UGEP,SA-MGEP and
OGEP increase significantly.UGEP maintains a success rate higher than 90 %while
that using SA-MGEP drops from 92 % to 76 % and that using OGEP drops from
94 %to 64 %.The R-Square values also indicate that UGEP can ensure an accurate
prediction.It is also noticeable that UGEP executes even faster than SA-MGEP and
OGEP do when achieving the above successes.
The standard deviation with each dimension is presented in Table 4.The results
indicate that the UGEP algorithmperforms as stably as SA-MGEP does in Sun Sport
time series prediction.The best functions obtained using UGEP can be written as the
following expressions:
Solving symbolic regression problems
Fig.7 Experimental results with UGEP,SA-MGEP and OGEP using SWPM
Y.Chen et al.
Table 4 The standard deviation with each dimension
Dimension 6 Dimension 10 Dimension 12
R-Square (UGEP) 0.00186 0.00127 0.00248
R-Square (SA-MGEP) 0.00172 0.00189 0.00283
R-Square (OGEP) 0.00242 0.00367 0.00642
Time (UGEP) 1.47 s 2.23 s 2.12 s
Time (SA-MGEP) 1.32 s 2.43 s 2.37 s
Time (OGEP) 1.74 s 2.26 s 2.38 s
Success rate (UGEP) 0.023 0.031 0.027
Success rate (SA-MGEP) 0.023 0.036 0.032
Success rate (OGEP) 0.024 0.022 0.014
(1) Dimension is 6:
X
m
=
￿￿
X
m−1
/exp(X
m−4
)
￿
−sqrt
￿
abs(X
m−4
)
￿￿
+
￿￿
X
m−1
/(X
m−5
−X
m−3
)
￿
∗ sqrt
￿
abs(X
m−3
)
￿￿
+10ˆ
￿
sqrt
￿
abs
￿￿
sqrt
￿
abs(0.999756)
￿
/(X
m−1
−X
m−4
)
￿￿￿￿
+
￿
X
m−2
/exp(X
m−5
)
￿
+X
m−1
;(R-Square =0.9587)
(2) Dimension is 10:
X
m
=exp
￿￿￿
sqrt
￿
abs(X
m−7
)
￿
/X
m−6
￿
−cos(X
m−7
)
￿￿
+
￿￿
X
m−10
/sqrt
￿
abs(X
m−7
)
￿￿
−log
￿
abs
￿
tan(X
m−6
)
￿￿￿
+log
￿
abs(X
m−10
)
￿
+cos
￿
sqrt
￿
abs
￿￿￿
10 ˆ(X
m−10
)/X
m−8
￿
+(X
m−10
−X
m−6
)
￿￿￿￿
+tan
￿￿
X
m−8
−sqrt
￿
abs
￿￿
log
￿
abs(X
m−6
)
￿
∗ X
m−9
￿￿￿￿￿
+sqrt
￿
abs
￿￿￿￿
X
m−4
−(−0.358257)
￿
∗ X
m−1
￿

￿
X
m−1
/(−0.117832)
￿￿￿￿
+
￿
sqrt
￿
abs
￿￿
X
m−4
/sin(X
m−1
)
￿￿￿
−X
m−4
￿
+10ˆ
￿
sqrt
￿
abs
￿
sin(X
m−5
)
￿￿￿
+
￿
10ˆ
￿
cos
￿
(X
m−1
−X
m−5
)
￿￿
+sqrt
￿
abs
￿
sqrt
￿
abs(X
m−4
)
￿￿￿￿
+sqrt
￿
abs
￿￿
X
m−1
∗ sin
￿
log
￿
abs(X
m−5
)
￿￿￿￿￿
;(R-Square =0.9690);
(3) Dimension is 12:
X
m
=log
￿
abs
￿
tan
￿
tan
￿￿
(X
m−9
−0.744011) ∗
￿
X
m−11
∗ (−0.385296)
￿￿￿￿￿￿
+tan
￿
sin
￿
log
￿
abs
￿
log
￿
abs
￿
sin
￿
(X
m−12
/X
m−10
)
￿￿￿￿￿￿￿
+
￿
log
￿
abs
￿
log
￿
abs
￿￿
X
m−12
∗ X
m−11
￿￿￿￿￿
∗ sqrt
￿
abs(X
m−11
)
￿￿
+log
￿
abs
￿
tan
￿
cos
￿
(X
m−10
−X
m−11
)
￿￿￿￿
+log
￿
abs
￿￿
tan(X
m−11
) +X
m−11
￿￿￿
Solving symbolic regression problems
+10 ˆ
￿
exp
￿
cos
￿
log
￿
abs
￿
log
￿
abs
￿
(X
m−5
+X
m−7
)
￿￿￿￿￿￿￿
+tan
￿
log
￿
abs
￿
sin
￿
log
￿
abs
￿
sqrt
￿
abs
￿￿
X
m−8
/(−0.202551)
￿￿￿￿￿￿￿￿￿
+log
￿
abs(X
m−8
)
￿
+log
￿
abs
￿
log
￿
abs
￿
sin
￿
log
￿
abs
￿￿
X
m−7
∗ X
m−5
￿￿￿￿￿￿￿￿
+log
￿
abs
￿
log
￿
abs
￿
sin
￿
log
￿
abs
￿
log
￿
abs(X
m−5
)
￿￿￿￿￿￿￿￿
+sqrt
￿
abs
￿￿
X
m−1

￿￿
X
m−1
∗ (−0.544969)
￿
∗ sqrt
￿
abs(X
m−1
)
￿￿￿￿￿
+tan
￿￿
log
￿
abs
￿￿
X
m−2
+(−0.956297)
￿￿￿
/sqrt
￿
abs
￿
log
￿
abs(X
m−4
)
￿￿￿￿￿
+
￿
tan
￿￿
log
￿
abs(X
m−2
)
￿

X
m−1
￿￿
./cos
￿
(−0.800043)
￿￿
+tan
￿
tan(X
m−4
)
￿
+log
￿
abs
￿
tan
￿￿
sqrt
￿
abs
￿
sqrt
￿
abs(X
m−2
)
￿￿￿
−(X
m−2
−X
m−4
)
￿￿￿￿
;
(R-Square =0.9814).
5.3 Sun spot predication via Differential Equation Prediction Method (DEPM)
The second round of experiments used a Differential Equation Prediction Method
(DEMP) [33].The method first analyzes the whole test data (e.g.,a time series).It
then constructs a differential equation to predict the future evolvement of the data
using the equation.Consider space partial derivative discretization and the way of
transforming a high-order to a low-order,and this study only involves ordinary differ-
ential equations less than 3 orders.We use Differential by Microscope Interpolation
(DMI),which incurs a relatively lowerror and noise [11].For each configuration,100
trials of experiments were executed.The averaged experimental results are presented
in Fig.8.The standard deviation of the results for each order is presented in Table 5.
Experimental results indicate that UGEP always performs better than SA-MGEP
and original OGEP do.Table 5 indicates that the UGEP algorithmperforms stably in
DEPMexperiments.The fitting performance is the best when the 2-order differential
equation is adopted.The best model derived using UGEP is:
Y =tan
￿
sin
￿￿
x −sin
￿
exp(z)
￿￿￿￿
+tan
￿
sin
￿￿￿
exp(0.278970) +u
￿
∗ sin(x)
￿￿￿
+
￿
log
￿
abs(u)
￿
−tan
￿
cos
￿
(x +u)
￿￿￿
+
￿
cos
￿
cos
￿
(0.871639 +x)
￿￿
+tan
￿
log
￿
abs(u)
￿￿￿
+tan
￿￿
sin(z) +(x +u)
￿￿
;(R-Square =0.9891)
where “u” represents the actual observed time,“x” represents the actual observed
value,“z” represents the 1-order derivative.The performance of UGEP slightly drops
in spite of the high computing complexity in 3-order.Even in this case,the value of
R-Square can still reach 0.9688 and the success rate is 98 %,which are significantly
higher than those obtained with SA-MGEP and OGEP.UGEP again executes faster
than SA-MGEP and OGEP do.
6 Conclusions and future work
In this study,we examined the feasibility and effectiveness of a Uniform Design-
aided Gene Expression Programming (GEP) approach to solving symbolic regression
Y.Chen et al.
Fig.8 Experimental results with UGEP and SA-MGEP using DEPM
Solving symbolic regression problems
Table 5 The standard deviation for the results
1-order 2-order 3-order
R-Square (UGEP) 0.00207 0.00213 0.00234
R-Square (SA-MGEP) 0.00312 0.00243 0.00313
R-Square (OGEP) 0.00325 0.003187 0.00347
Time (UGEP) 1.14 s 1.67 s 2.34 s
Time (SA-MGEP) 1.45 s 2.37 s 2.42 s
Time (OGEP) 1.224 s 1.47 s 2.44 s
Success rate (UGEP) 0.017 0.026 0.023
Success rate (SA-MGEP) 0.021 0.034 0.038
Success rate (OGEP) 0.015 0.027 0.022
problems.GEP emerged as a salient variant of evolutionary computing approaches,
which significantly surpasses its counterparts such as GPs and GAs in dealing with
these problems.However,existing GEP algorithms still suffer from premature con-
vergence and slow evolution in anaphase.
We trust that the key to address these problems lies with how to maximize the
diversity of chromosome,which in turn demands an appropriate population initial-
ization approach to achieve this goal.Based on this hypothesis,we developed a novel
GEP algorithm,namely uniformdesign GEP (UGEP).When initializing the popula-
tion,UGEP uses a mixed-level uniformtable to ensure that the samples are represen-
tative and well distributed.The size of the initial population (the set of samples) has
also been minimized to make sure the cost for locating the optimal solution is toler-
able.Furthermore,we developed a multiparent approach instead of using stochastic
evolution in the design of the cross operator.The approach thoroughly hybridizes
multiple parents to increase the chance to obtain offspring with high fitness values.
We performed a theoretical analysis on UGEP to examine its performance.It has
been mathematically proved that UGEP can always converge to the global optimal
solution.In terms of convergence speed,the number of optimal offspring at the sub-
sequent generation increases more quickly in UGEP than that in the original GEP.
Aseries of experiments have been carried out to have an in-depth investigation on
the performance of the proposed UGEP against existing GEP variants.The symbolic
regression problems under investigation include function fitting and chaotic time se-
ries prediction.For the function fitting problem,ten sets of experiments have first
been performed to search for the optimal parameter setting for the UGEP algorithm.
The R-Square value obtained using UGEP with the best parameter setting can reach
0.999795 while the best value is 0.986652 using the original GEP;and the execu-
tion time of UGEP and OGEP are 18.32 s and 26.47 s,respectively.We then applied
the UGEP algorithm with the parameter setting in sun spot predication using alter-
native methods,namely Slide Window Prediction Method (SWPM) and Differential
Equation Prediction Method (DEPM).In comparison with SA-MGEP and OGEP,the
R-Square values obtained using UGEP are always higher (e.g.,with the dimension
=12,0.9814 (UGEP) VS.0.8677 (SA-MGEP) VS.08842 (OGEP).Notwithstand-
ing,the convergence speeds of UGEP are always higher than those of SA-MGEP and
Y.Chen et al.
OGEP (e.g.,with the dimension =12,111.8 s (UGEP) VS.131.2 s (SA-MGEP) VS.
148.6 s (OGEP)).UGEP also stands a higher rate to achieve the global optimal solu-
tion than SA-MGEP and OGEP (e.g.,with dimension =12,91 %(UGEP) VS.76 %
(SA-MGEP) VS.64 %(OGEP)).
Both theoretic analysis and experimental results indicate that UGEP excels in
terms of both the capability of achieving the global optimum and the convergence
speed when dealing with symbolic regression problems.
For future work,we will consider the interactions among the parameters.Another
interesting work is to use various uniform tables for population initialization and for
the crossover operator.
Acknowledgement This work is sponsored in part by the National Basic Research Program of China
(973 Program) under Grant No.2011CB302303,the National Natural Science Foundation of China (Grant
Nos.61272314,60933002),National High Technology Research and Development Program of China
(863 Program) under Grant No.2013AA013203,the Specialized Research Fund for the Doctoral Program
of Higher Education (Grant No.20110145110010),the Excellent Youth Foundation of Hubei Scientific
Committee (Grant No.2012FFA025),the Programfor NewCentury Excellent Talents in University (Grant
No.NCET-11-0722),and Wuhan Chenguang Project (2013070104010019).The authors would also like
to thank Dr.Siwei Jiang for the source code of SA-MGEP [20].
References
1.Goldberg DE (1989) Genetic algorithms in search,optimization,and machine learning.Addison-
Wesley,Reading
2.Koza JR (1992) Genetic programming:on the programming of computers by means of natural selec-
tion.In:Modeling adaptive multi-agent systems inspired by developmental biology,vol 229
3.Tsai CC,Huang HC,Chan CK (2011) Parallel elite genetic algorithm and its application to global
path planning for autonomous robot navigation.IEEE Trans Ind Electron 10:4813–4821
4.Chung SH,Chan H (2012) A two-level genetic algorithm to determine production frequencies for
economic lot-scheduling problem.IEEE Trans Ind Electron 59:611–619
5.Zhang X,Hu S,Chen D,Li X (2012) Fast covariance matching with fuzzy genetic algorithm.IEEE
Trans Ind Inform8:148–157
6.Varadan V,Leung H (2001) Reconstruction of polynomial systems from noisy time-series measure-
ments using genetic programming.IEEE Trans Ind Electron 48:742–748
7.Ferreira C (2003) Function finding and the creation of numerical constants in gene expression pro-
gramming.In:Advances in soft computing:engineering design and manufacturing,vol 265
8.Li X,Zhou C,Xiao W,Nelson PC (2005) Prefix gene expression programming.In:Genetic and
evolutionary computation conference (GECCO05),Washington,DC,USA,pp 25–29
9.Ferreira C (2001) Gene expression programming:a new adaptive algorithm for solving problems.
arXiv:cs/0102027
10.Zuo J,Tang C,Zhang T (2002) Mining predicate association rule by gene expression programming.
In:Advances in web-age information management,pp 281–294
11.Zuo J,Tang C,Li C,Yuan C,Chen A (2004) Time series prediction based on gene expression pro-
gramming.In:Advances in web-age information management,pp 55–64
12.Peng J,Tang C,Li C,Hu J-J (2005) M-GEP:a new evolution algorithm based on multi-layer chro-
mosomes gene expression programming.Chin J Comput 28:1459–1466
13.Zhou C,Xiao W,Tirpak TM,Nelson PC (2003) Evolving accurate and compact classification rules
with gene expression programming.IEEE Trans Evol Comput 7:519–531
14.Karakasis VK,Stafylopatis A (2008) Efficient evolution of accurate classification rules using a com-
bination of gene expression programming and clonal selection.IEEE Trans Evol Comput 12:662–678
15.Abdelaziz AY,Mekhamer S,Khattab H,Badr M,Panigrahi BK (2012) Gene expression program-
ming algorithmfor transient security classification.In:Swarm,evolutionary,and memetic computing.
Springer,Berlin,pp 406–416
Solving symbolic regression problems
16.Wang H,Liu S,Meng F,Li M(2012) Gene expression programming algorithms for optimization of
water distribution networks.Proc Eng 37:359–364
17.Ferreira C (2002) Mutation,transposition,and recombination:an analysis of the evolutionary dynam-
ics.In:The 6th joint conference on information sciences,4th international work shop on frontiers in
evolutionary algorithms
18.Shi K-F,Dong J-W,Li J-P,Shouning Q,Bo Y (2002) Orthogonal genetic algorithm.Acta Electron
Sin 10:1501–1504
19.Lopes HS,Weinert WR (2004) EGIPSYS:an enhanced gene expression programming approach for
symbolic regression problems.Int J Appl Math Comput Sci 14:375–384
20.Jiang S,Cai Z,Zuo D (2005) Parallel gene expression programming algorithm based on simulated
annealing method.Acta Electron Sin 33:2017–2021
21.Fang KT,Lin DKJ,Winker P,Zhang Y(2000) Uniformdesign:theory and application.Technometrics
42:237–248
22.Wang Y,Fang K (1981) A note on uniform distribution and experimental design.KeXue TongBao
26:485–489
23.Hu JJ,Tang CJ,Du L,Zuo J,Peng J (2007) The strategy for diversifying initial population of gene
expression programming.Chin J Comput 30:305–310
24.Koza JR (1994) Genetic programming II:automatic discovery of reusable programs:mitpress
25.Fang KT,Yang ZH(2000) On uniformdesign of experiments with restricted mixtures and generation
of uniformdistribution on some domains.Stat Probab Lett 46:113–120
26.Xing WX,Xie JX (1999) Advanced computational methods for optimization.Tsinghua University
Press,Tsinghua,pp 140–181
27.Wu S,Zhang Q,Chen H (1997) a new evolutionary algorithmbased on family eugenics.J Softw 2
28.Leung YW,Wang Y (2001) An orthogonal genetic algorithm with quantization for global numerical
optimization.IEEE Trans Evol Comput 5:41–53
29.Huang CM,Lee YJ,Lin DKJ,Huang SY (2007) Model selection for support vector machines via
uniformdesign.Comput Stat Data Anal 52:335–346
30.Szpiro GG(1997) Forecasting chaotic time series with genetic algorithms.Phys Rev E 55:2557–2568
31.Cui B,Zhao Z,Tok W(2012) A framework for similarity search of time series cliques with natural
relations.IEEE Trans Knowl Data Eng 24:385–398
32.Khan SU,Bouvry P,Engel T (2012) Energy-efficient high-performance parallel and distributed com-
puting.J Supercomput 60:163–164
33.Khan SU,Min-Allah N(2012) Agoal programming based energy efficient resource allocation in data
centers.J Supercomput 61:502–519