The Efficient Set GA for Stock Portfolios
Jacqueline Shoaf and James A. Foster
Dept. of Computer Science, University of Idaho, Moscow, Idaho
email: jackies@alaska.net, foster@cs.uidaho.edu
Abstract
The genetic algorithm (GA) for the efficient set
port
folio problem based on the Markowitz
model introduced by Shoaf and Foster[4] offers
significant benefits over the quadratic
programming approach. These benefits include
simultaneous optimization of risk and return.
The efficient set GA uses an indirect
r
epresentation style in order to avoid unfeasible
solutions and penalty functions. This
representation
is
general
ly
applica
ble to
problems which seek
an optimal partition
f
or a
g
iven amount of some
resource which
inc
l
udes
both n
egative and positive allocations.
E
fficient
set GA evolution scales well and is O(n log n)
with a small constant for portfolios containing
up to n=100 stocks. Using demes further
improves the quality of solution and the run time
for this GA.
1.
Introduction
The efficient s
et portfolio problem is
to
find
the allocation of investments for
given set of
securities with
minimum risk for any given rate of
return.
Markowitz’s [2] approach to solving this
problem uses the covariance matr
ix derived from
historical rates of return to predict the variance,
or “risk factor” of any allocation of resources.
The Markowitz model accomodates both long
and short
positions. A long position
represents
an allocation for purchase of securities, w
hereas
a short position represents an allocation from the
sale of borrowed securities.
In the Markowitz model, the weighted sum of
the values in the rates of return covariance matrix
represents the overall variance,
rp
2
, of a
portfolio. Let n be the numb
er of stocks in the
portfolio,
x
i
be the proportion of resources
allocated for stock
i
(negative for short
positions),
E(r*
p
)
be the given expected rate of
return for the portfolio, and
E(r
j
)
be the expected
rate of return for each security. The objective
equation for the efficient set portfolio problem is:
min
(r
p
)
2
=
with the following constraints:
1) E(r*
p
)=
2) 1.0
=
The quadratic programming approach to this
problem described by Haugen [1, Appendix 3]
requires the objective
equation to be rewritten in
Lagrangian form. For minimization, the partial
derivative of each variable is taken and set to 0.
This leaves a set of linear simultaneous
equations which can be solved for the
coefficients of allocation for the minimum
varia
nce portfolio. This portfolio will have the
given expected rate of return ,
E(r*
p
), which is
specified as a constraint. Note that
E(r*
p
)
is
required as input, so the quadratic programming
approach can only solve the efficient set problem
for one portfol
io rate of return. Also, the
algorithm for solving a set of linear simultaneous
equations has time complexity between O(
n
2
) and
O(
n
3
), according to Smith [5].
2. The Efficient Set GA
The GA solution to the efficient
set problem
Shoaf and Foster [4
] alt
ers the problem slightly to
solve for an efficient set portfolio over the entire
range of potential expected portfolio returns.
Each member of the GA population represents an
allocation of resources for the portfolio. The user
selects a desirable balance
between risk and
return using adjustable constants in the GA
fitness function:
where
,
,
are set by the user,
E(r
p
)
represents
the expected rate of return of the portfolio
represented by the population member, and
E(r*
p
)
repres
ents the user’s target expected rate
of portfolio return.
Because the efficient set portfolio problem is
an allocation problem, a direct representation of
resource allocation by each population member in
the GA will not work well. This type of
representat
ion will result in predominantly
unfeasible solutions in every generation, where
the allocations do not sum to 1.0.
Our representation has a single field of
k+1
bits for each security. The first bit indicates
whether the position on that stock will be
long
(one) or short (zero). The remaining
k
bits are an
unsigned index onto an “allocation wheel”.
Conceptually, this is a wheel representing the
resources to be
allocated. I
t is divided into 2
k
equal sections, each indexed by a k bit binary
value.
The distance between an index and the
index of the next long position, plus any enclosed
short position wedges, is the percentage of the
total resource allocation for the security with that
index.
(See Figure 1.)
More precisely, suppose that
i
1
,…,i
n
are the
indices for n
securities, in non

decreasing order,
mod 2
k
(so that 000, for example, follows 111).
Now, let
S
be the set of indexes of securities with
short positions. Let,
L(j)
be the next index of a
long position on the allocation wheel. Now, let
d
j
be:
with
a=

1
and b=1 when
j
S
, and a=1 and b=0
otherwise.
Pictorially,
d
j
is the length of the arc between
the index
i
j
and the next
long
security, with any
subtended short position arcs added in. Figure 1
is an example of this representation for
k=3
a
nd
n=5
.
An important benefit of this representation is
that
, for any set of
short positions
S
and any
chromosome
. That is,
the total investment is always one hundred
percent of the available resources. This makes it
impossible to cr
eat
e an unfeasible solution. So,
every member of the population in each
generation can co
ntribute to the next generation
and
no valuable schemata are discarded. This
also avoids any unpredictable influence on
evolution by penalty functions in th
e fitness
function, since they are not necessary. In
general this representation makes the GA
efficient. This indirect style of
representation
should work for any optimization problem
where
allocation proportions may be either positive or
negative a
nd must sum to a given value.
Sum of Allocation s for Stocks 0

4 (in order):
.125 +

.25 +

.
1
25 + .625 + .625 = 1.00
Figure 1. Allocation based on solution
representation
One of the less obvious effects of this
representation
style
,
however, is the sensitivity
of the efficient set GA to increases in mutation
and crossover rates. A change in the index of
one security can affect one to two other security
allocations.
Experiments were conducted using a small set
of 5 stocks [3
] comparing the effective set GA to
the quadratic programming technique and
demonstrated the benefits of simultaneously
optimizing for return and risk. These experiments
demonstrated that the GA could find portfolio
allocations with similar risk and highe
r rates of
return than the risk

constrained
quadratic
programming solution
.
The data for these experiments was derived
from end

of

week closing data accumulated over
an eleven month period beginning October 3,
1994. The covariance matrix for stock rates
of
return is shown in Table 1 and the averaged
annualized rates of return are shown in Table 2.
Table 1. Stock covariance matrix
CYBE
ISLI
NBL
ORLY
CYBE
ISLI
2.45
NBL
1.36
2.76
ORLY
0.07

0.10
0.99
RGIS
0.55
0.44

0.02
0.68
Table 2
.
Average Rate of Return
CYBE
ISLI
NBL
ORLY
RGIS
0.589

1.573
1.219
.159

0.094
The allocation proportions, obtained using the
quadratic programming technique with a given
expected return rate of .15, are shown in Table 3.
This solution yields a minimum r
isk,
rp
2
,
of
.405
.
Table 3. Allocation by quadratic
programming
CYBE
ISLI
NBL
ORLY
RGIS

0.l0
0.15
0.30
0.55
0.10
Five randomly

seeded GA runs were
conducted using the same data [3]. The length of
each chromosome was 35 bits, with individual
fields of
7 bits. Each run lasted 200 generations
using a population size of 300.
The allocation proportions obtained using the
GA are shown in Table 4. This solution is the
best over all five runs. These allocations yie
ld an
expected rate of retur
n,
of
.52
w
ith an associated
minimum risk
,
rp
2
, of
.384
. In this case the
efficient set GA was clearly able to find
a
better
solution than quadratic programming.
Table 4. Allocation by GA
CYBE
ISLI
NBL
ORLY
RGIS
0.0
0.047
.422
.516
0.016
Figure 2.
The convergence graph of the avera
ged GA
solution, which plots average generation fitness
and best

of

generation fitness against
generation, is shown in Figure 2. The plot
demonstrates the expected exponential increase
in fitness values.
3. Complexity of the Efficient Set GA
We also ran
to determine the exact expected
time complexity of the GA as a function of
n
, the
number of securities in the portfolio. Since
n
is a
variable that is local only to the objective
function of the GA, this would confirm that the
t
ime complexity of the
fitness
function dominates
the GA as
n
increases. An efficient algorithm in
terms of
n
implies that the GA can be used for
portfolio allocations involving large numbers of
securities. Notice that the size of the
chromosomes, and t
herefore the complexity of
the GA, depends on
both
the number of securities
and
the number of s
lices on the allocation wheel,
so it is not obvious
a priori
that the number of
securities is the critical performance parameter.
Each chromosome contains
n
fields,
representing an investment position (long or
short) and the index used for allocation for eac
h
security. Thus the product of
n
and the field
size
determine
s
the total length of each chromosome.
However, the
fitness
function does a number of
transformations based on the values in each field,
in order to convert the chromosome into a
portfo
lio allocation. These transformations
affect the algorithmic time complexity of the GA.
The indirect representation method for allocation
determination described earlier requires that the
fields of the chromosome be sorted, an operation
of average expe
cted complexity in O(n log n)
using a quicksort (the O(
) worst case
behavior rarel
y
shows up in practice). We
anticipated that sorting
would dominate the t
ime
complexity of the fitness
function and the GA.
The experiments con
sisted of 25 sets of GA
runs, one set for each of 5 different values of
n
: 8,
16, 32, 64 and 100. Time was clocked on either
side of the evolution step in the GA in order to
bypass time required by chromosome
initialization, which is assumed to be linear
in
n
with a small constant. Between the sets, all other
GA constants remained the same. However,
since the absolute length of the chromosome is
also affected by the change in
n
, a separate set of
experiments was conducted to determine which
type of chang
e dominated time complexity of the
GA.
A fixed population size of 100 chromosomes
and runs lasting 100 generations were used.
These and other basic GA parameters are noted
in Table 5.
Table 5. Basic GA parameters for
complexity experiments
GA Type
Simple
Crossover
2

Pt
Sel ect i o n
Rou l et t e

Wh eel
Fi el d Si ze
7
P o p u l at i o n Si ze
100
Crossover Rat e
.6
Mutati
on Rate
.001
Generations
100
The averaged experimental results are
summarized on the chart in Figu
re 3. Let
t
be the
time, in seconds
,
required to evolve this
population 100 generations. The algorithmic
complexity of the
fitness
function based on the
data used here appears to very well fit the
equation
t=F(n)= c*n log
2
n
. In this case the
constant
c
is between .1 and .2.
Figure 3
In order to determine whether the GA run time
was dominated by changes in
n
or by changes in
overall chromosome length, we ran an additional
set of experiments in which
n
, the number of
securities, remains constant
but the chromosome
length changes based on the field size
The number of bits in a field determines the
minimum allocation proportion for any stock in
the portfolio and the maximum number of
securities in the portfolio with allocations greater
than 0. I
ncreasing the number of securities in the
portfolio also makes it desirable to increase the
field size to reduce the likelihood of index
collision. So, in practice these two values should
be correlated. But for our experiments, we used a
constant number
of fields (securities),
n
=32, and
varied the size of each field from 3 to 7,
effectively changing the chromosome length from
96 to 224 bits. With the exception of field size, all
other parameters from Table 5 remained the same.
The averaged results from se
ts of 25
experiments in each configuration, summarized in
the chart in Figure 4, shows that increasing the
number of bits in each field has a more gradual
effect on time complexity than increasing
n
, the
number of fields. The graph shows constant
linear
growth in time with increasing chromosome
length. For this population,
t=a*c*(bits per
field)
, with a=14.3 and c=0.49. The effect of
increasing number of bits in the chromosome,
without increasing
n
, may be to increase the time
required for the mutation
operation, which is
dependent only on total number of bits.
Figure 4.
Therefore, the empirical data from these
experiments confirms that overall algorithmic time
complexity for this GA application is affected
mainly by changes in the number of security in
the portfolio,
n
, and that the order of
expected
time complexity for the
fitness
function and the
GA is O(
n
log
n
) with small constants.
4. The Deme Modification
A natural modification for improving time
efficiency for the efficient set GA over multiple
single population runs is the use
of a deme
model. In a deme model, subpopulations evolve
independently and migrate their most highly fit
members periodically. The deme model was
designed to allow parallel evolution on a
multiprocessor system. In addition to improving
time efficiency the
deme model may improve the
capability of the efficient set GA. For the
efficient set GA the deme model appears to be
better than multiple single runs because it
provides for alternating periods of local hill

Averaged
Data
C=0.2
C=0.1
Averaged
Data
c=0.49, , a=14.3
climbing and global competition between
improv
ed local optima.
The single population and deme models are
compared here based on the number of
generation steps rather than absolute GA
runtime. There are several reasons for this. The
deme modification that was implemented for
these experiments requires
a steady state GA
framework, one in which only a small proportion
of deme members are replaced every generation.
The deme model has a migration step, which is
absent from the single population model. And,
although the deme model can be run on a
multipr
ocessor system, our implementation was
designed for a single processor system with
deme evolution implemented sequentially.
Because of the disparity in the models, this
comparison is based on the quality of optimal
results between the two models after 200
0
generation steps rather than on absolute runtime.
A generation step for the deme model includes
one generation in each of the subpopulations,
since the deme model can potentially allow
evolution to proceed simultaneously for each
subpopulation on a mult
iprocessor system.
Table 6
.
Experiment Set
1
(10 runs)
2
(10 runs)
3
(10 runs)
GA Type
Simple
Simple
Steady
State/Deme
Populations
1
1
5
Gens
2000
2000
100
Epochs
N/A
N/A
20 Gens/Epoch
Repl ace/Gen
All
All
5
PopSi ze
100
100
100
Xover
2

Pt
2

Pt
2

Pt
Sel ect
Rou l et t e
Rou l et t e
Rou l et t e
Fi el d s
6
6
6
St o ck Set
10
10
10
Xover Rat e
.6
.6
.7
Mut Rate
.001
.007
.007
The GA parameters are shown in Table 6.
Two sets of experiments were run for the single
population model. In the first set, mutation a
nd
crossover rates were the highest possible (within
.001 and .1 increments, respectively) that would
still allow convergence and guarantee a final local
hill

climbing phase. The second set was run
at a
much higher mutation rate with no convergence
(
the
re were less than 3 matched population
members in the any final generation from set #2)
to see whether it would be possible to find a more
optimal solution doing a more random search. In
general, higher mutation rates provide the
longest possible solution
space exploration
phase in the efficient set GA, which may be a
result of the potentially highly multimodal
solution space.
The fitness profile (Figure 5
)
from one of the
deme runs in set #3 demonstrates the alternating
influences of local improvement with
total
population competition during evolution. The
results of the three sets of experiments, in terms
of the statistics for the generated
portfolios, are
shown in Table 7
. While the empirical data is very
limited, it serves to illustrate how well the deme
model works for this GA. The portfolio statistics,
which reflect optimal fitness statistics, show that
the deme model has the potential to produce
results comparable to and better than the single
population GA for the same time resource or
number of gener
ations. Assuming the time for
the migration step is minimal, the time resource
required for multiple GA runs can be cut
geometrically by the use of the deme model with
no loss of capability.
Figure 5.
Table 7
.
Portfolio statistics
Retur
n Risk
Set
Mean
Std. Dev.
Mean
Std. Dev.
1
.325
.069
.485
. 122
2
.366
.074
.426
. 022
3
.398
.063
.424
. 017
Conclusions
We compared a simple GA approach to
solving the efficient set problem to the more
traditional quadratic pr
ogramming approach
using covariances. The GA can simultaneously
minimize risk and maximize expected return,
whereas the quadratic programming approach
must hold the risk constant. This flexibility
allows the GA to discove
r portfolio opportunities
that
t
he more traditional approach misses.
Best of Gen
Ave of Gen
Best of Gen
Ave of Gen
We also examined the expected time
complexity of the GA solut
ion. Our experiments
show
that when
the GA is run with por
tfolios
smaller than
n=
100 stocks
, the expected time
complexity of the genetic algorithm is O(n log
n)
with a very small constant. This is greatly
superior to the time complexity of quadratic
programming. Moreover, the GA complexity can
be prima
rily attributed to the fitness
function,
which produces a portfolio allocation from an
indirect solution r
epresentation. The
representation style is advantageous because it
eliminates the possibility of i
nfeasible solutions
and the need for penalty functions. Additional
experiments demonstrate that the O(n log
n)
complexity of the fitness
function overshad
ows
the linear relationship between overall length of
the chromosome and GA runtime.
Finally, we demonstrated the effectiveness of
using demes for this GA. This modification was
shown to have the potential of finding solutions
comparable and possibly su
perior to those
gained from multiple single population runs. A
contributing factor to the success of this type of
modification may be the highly multimodal
character of the potential solution spac
e in the
efficient set problem.
Acknowledgements
The software for this work used the G
A
lib
genetic algorithm package, w
ritten by Matthew
Wall at the Massachusetts Institute of
Technology.
References
[1] Haugen, R.A
.,
Modern Investment Theory
, Prentice
Hall Inc.,Englewood Cliffs, N.J., 1993.
[2]Markowitz, H.M.,
Portfolio Selection,
Basil
Blackwell, Inc. Cambridge, MA., 1991.
[3]Shoaf, J.S. and Foster, J.A., “
A Genetic Algorithm
Solution to the Efficient Set Problem:
A Technique for
Portfolio Selection Based on the Markowitz Model
”,
Tech Report
.
, Dept. of Computer Science, Univ. of
Idaho, Moscow, ID, 1995.
[4]Shoaf, J.S., and Foster, J.A., “A Genetic Algorithm
Solution to the Efficient Set Problem: A Technique for
Portfolio Selection Based on the Markowitz Model”,
Proc. 1996 Annual Meeting,,
Vol. 2, Decision Sciences
Institute, Orlando, FL., 1996, pp. 571

573.
[5] Smith, H.A
., Data Structures: Form and Function
,
Harcourt Brace Jovanovich, Inc. San Diego, CA., 1987.
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο