Title:
Identification of pareto non

dominated sets
of NONMEM models using a multi

objective evolutionary algorithm
.
Authors:
Mark Sale* (1),
Bruce G
.
Pollock (2), Robert
R
.
Bies (2,3)
Institutions:
(1) Next Level Solutions, Raleigh, NC, USA and Indiana Uni
versity, Division of Clinical Pharmacology,
Indianapolis, IN, USA; (2) Centre for Addiction and Mental Health, University of Toronto, Toronto, Ontario, Canada;
(3) Indiana University, Division of Clinical Pharmacology, Indianapolis, IN, USA.
Objectives:
I
nvestigate feasibility of multi

objective evolutionary algorithm in identifying pareto optimal solution sets
for population pk modeling
.
Methods:
The pareto principle, also known as “the law of the vital few”
1
and the “80

20 rule”,
is the observation that
within a group of objects, only a relatively small number are ultimately important for the outcome. Using this approach
,
a
n evolutionary algorithm was developed to select
pareto
sets of non

dominated models for a given population pk data
set using NONMEM
®
. We have previously presented results from a simple evolutionary algorithm
2
. The simple
evolutionary algorithm did not permit user based, less quantitative, experience driven input, such as biological feasibility
or diagnostic plot evaluation into the m
odel selection process.
Multi

objective pareto optimization is useful for scenarios
where there is
a tradeoff between multiple
objective
s
, and therefore a single, strictly n
umerical solution isn’t reasonable
.
Rather, a set of solutions that app
ear
s
to be
the best, based on several
objectives is generated.
This is the set of the “vital
few”, those that seem to be relevant to the selection of the final model. Within this
pareto
set of solutions, t
he tradeoff
between these objectives can then be assessed us
ing less quantitative means. Permitting less quantitative input into
optimal solution identific
ation has been an active area of
research in evolutionary algorithms. We have applied this
research to the problem of selection
of
population pk models.
Non

dom
inated models meet two
requirements. F
or a model to be non

dominated
1) it must be the case that there are no
models in the solution set that are superior on all objectives
; and
2) it must be the case that the model is superior to any
other model in the s
olution space on at least one objective.
These non

dominated models are identified from a solution
set of NONMEM models.
The solution set is the set of all possible models that could result from combinations of various
features. In this example, these fe
atures include number
of compartments, presenceabsence of a mixture model, various
covariate relationships, various between subject variance structures and various residual variance structures. The
objectives used for the present search were:

2 log likel
ihood (

2ll)
Number of parameters
(both fixed and random)
“Quality” of solution from NONMEM
–
scored as 0

4, based on number of significant digits, successful
covariance step and successful correlation test.
Global adjusted p

value from NPDE
4
This approach
searches a candidate solution space of all proposed NONMEM models. In the current example, the
solution space included 2.68*10
8
models. Of these, approx
imately 6
000 were constructed,
compiled and run by the
algorithm
. Using combinatorial optimization, th
e method then searches for the set of models that are non

dominated.
For example, for a given pair of models from the solution set, one may have a
value for

2ll of 1000, with ten
parameters.
The other might have
a value for

2ll of 1020 with nine
parame
ters. Neither
mode
l “dominates” the other
–
the first
is
bette
r as measured by

2ll, the second is
be
tter as measured by parsimony (number
of parameters).
Howe
ver, in
comparing a model with

2ll of 1000 and
ten
parameters
to a model with

2ll of 1010 and
11 parameters, the first model
“do
minates” the second
. The first model
is not worse on all objectives (

2ll and
number
parameters), and better on at
least one (or in this case, both).
Given this approach, non

dominated “fronts” can be defined. The first
front is the set of non

dominated models. The
second front is the set of non

dominated model
s
after models in the first front are removed from the solution set, and so
on. In this way, models in
the solution set can be ranked
based on
of which front the
y are a member
. Given this ranking,
genetic algorithm (in this case, binary tournament based) can be applied to search for the set of optimal (non

dominated)
models in the solution space. Models that enter into the next generation are then selected us
ing
binary tournament
selection
based on the rank.
For binary tournament, each mode
l is selected in random sequence
and compared to another
randomly selected model. If the ranks of the models are different, the model with the lower rank is the winner
and is
selected as a “parent” for the next generation
. If the ranks are t
he same, but one model dominates
the other
, the dominant
model is selected
.
If the ranks are the same but
neither model dominates
the other, the model
that is less “crowded” is
selected.
Crowding is a function of how close other models are to the present model in the front. In order to preserve a
distribution of models across the front, models that are less crowded are preferred.
This process of selecting each model
randomly and comparin
g it to another random model is repeated
a second time
in order to preserve the number of models
in the
next
generation.
Two “parents” are selected in this way (with
two binary tournaments
). These p
arents are then
used to create
the next generation, afte
r crossing over and mutating the bit strings. The specific algorithm used in this
example is NSGA

II (
N
on

dominated,
S
orted
G
enetic
A
lgorithm)
3
. This process is repeated until the
non

dominated set
appears
stable.
The
pareto set of
models in the non

domi
nated fronts (1 or more non

dominated fronts) at the conclusion
of the optimization can then be presented to the user for additional evaluation (based on biological plausibility,
diagnostic plots
,
etc
.
).
Results:
Figure 1
shows
a
plot of the values o
f

2ll vs
.
number
of parameters
in the first front
for the 1
st
, 5
th
,
15
th
and 16
th
generations.
This
plot
shows
an essential trade
off in modeling
–
goodness of fit vs. parsimony. It can be
s
een that in iterating
from generation
s
1 to
5
to 15 the
first front (
set of
non

dominated models) improves
in the
value of

2ll and/or number
of parameters (shifting left
and down, lower

2ll and fewer parameters). Fro
m
generation 15 to generation 16
little additional
improvement is seen
,
suggestin
g that convergence has
occurred, with the non

dominated solution set being
identified. Also note that anywhere along this front, the
slope (or

1*slope,
the decrease
in

2ll/additional
parameter
) exceed
s
the AIC value
for non

hierarchical
models.
Figure
2 shows a similar plot for
number of parameters
vs
.
–
log(NPDE glo
bal p value). The p
value for NPDE is
presented as

log() for convenience in plotting only. The
tradeoff between NPDE and number of parameters is not
as clear as
is seen
for

2ll vs
.
number
of parameters, but a
s
ubtle relationship is still observed
, with more complex
models (more parameters) tending to have a better p
value for NPDE.
Interesti
ngly, the NPDE p
value falls
(larger
–
log(NPDE)
)
at the largest values for number of
parameters,
pe
rhaps suggesting
overparameterization.
Conclusions:
A multi

objective evolutionary algorithm
is capable of identifying a pareto set of non

dominated
po
pulation pk models. T
he objectives use
d are

2ll,
number
of parameters,
“quality” of the model results
a
nd
the global p value from NPDE
. Additional objectives
that could be added include simul
ation based metrics
such as PPC
. The results of this pareto
search
can then
be presented to the user, possibly sorted by other criteria (AIC, DIC, NPDE, successful co
nvergence, successful
covariance etc), with appropriate graphics generated for each, to be used for additional model selection.
References:
[1]
Manuale di economia politica,
Vilfredo Pareto,
1906
[2
]
Robert R. Bies
,
Matthew F. Muldoon
,
Bruce G. Pollock
,
Steven
Manuck
,
Gwenn Smith
and
Mark E. Sale
.
A Genetic
Algorithm

Based, Hybrid Machine Learning Approach to Model Selection
Journal of Pharmacokinetics and
Pharmacodynamics
,
33, (2
), 195

221, 2006
[3
] Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. A Fast and Elitist Multi

objective Genetic
Algorithm. N
SGA

II, IEEE Transactions on Evolutionary Computation, 6, (2), 2002
[4
]
Emmanuelle Comets, France Mentré. Using simulations

based metrics to detect model misspecifications, PAGE
meeting 2010
Figure 1
Figure 2
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο