Objectives: Text regarding objectives

bankpottstownΤεχνίτη Νοημοσύνη και Ρομποτική

23 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

81 εμφανίσεις

Title:
Identification of pareto non
-
dominated sets

of NONMEM models using a multi
-
objective evolutionary algorithm
.

Authors:
Mark Sale* (1),

Bruce G
.

Pollock (2), Robert

R
.

Bies (2,3)

Institutions:
(1) Next Level Solutions, Raleigh, NC, USA and Indiana Uni
versity, Division of Clinical Pharmacology,
Indianapolis, IN, USA; (2) Centre for Addiction and Mental Health, University of Toronto, Toronto, Ontario, Canada;
(3) Indiana University, Division of Clinical Pharmacology, Indianapolis, IN, USA.

Objectives:

I
nvestigate feasibility of multi
-
objective evolutionary algorithm in identifying pareto optimal solution sets
for population pk modeling
.

Methods:

The pareto principle, also known as “the law of the vital few”
1

and the “80
-
20 rule”,

is the observation that

within a group of objects, only a relatively small number are ultimately important for the outcome. Using this approach
,

a
n evolutionary algorithm was developed to select
pareto
sets of non
-
dominated models for a given population pk data
set using NONMEM
®
. We have previously presented results from a simple evolutionary algorithm
2
. The simple
evolutionary algorithm did not permit user based, less quantitative, experience driven input, such as biological feasibility
or diagnostic plot evaluation into the m
odel selection process.
Multi
-
objective pareto optimization is useful for scenarios
where there is
a tradeoff between multiple

objective
s
, and therefore a single, strictly n
umerical solution isn’t reasonable
.
Rather, a set of solutions that app
ear
s

to be

the best, based on several

objectives is generated.
This is the set of the “vital
few”, those that seem to be relevant to the selection of the final model. Within this
pareto
set of solutions, t
he tradeoff
between these objectives can then be assessed us
ing less quantitative means. Permitting less quantitative input into
optimal solution identific
ation has been an active area of

research in evolutionary algorithms. We have applied this
research to the problem of selection
of
population pk models.

Non
-
dom
inated models meet two

requirements. F
or a model to be non
-
dominated
1) it must be the case that there are no
models in the solution set that are superior on all objectives
; and

2) it must be the case that the model is superior to any
other model in the s
olution space on at least one objective.
These non
-
dominated models are identified from a solution
set of NONMEM models.
The solution set is the set of all possible models that could result from combinations of various
features. In this example, these fe
atures include number

of compartments, presence|absence of a mixture model, various
covariate relationships, various between subject variance structures and various residual variance structures. The
objectives used for the present search were:



-
2 log likel
ihood (
-
2ll)



Number of parameters

(both fixed and random)



“Quality” of solution from NONMEM


scored as 0
-
4, based on number of significant digits, successful
covariance step and successful correlation test.



Global adjusted p
-
value from NPDE
4

This approach

searches a candidate solution space of all proposed NONMEM models. In the current example, the
solution space included 2.68*10
8

models. Of these, approx
imately 6
000 were constructed,
compiled and run by the
algorithm
. Using combinatorial optimization, th
e method then searches for the set of models that are non
-
dominated.
For example, for a given pair of models from the solution set, one may have a

value for
-
2ll of 1000, with ten

parameters.
The other might have

a value for
-
2ll of 1020 with nine

parame
ters. Neither

mode
l “dominates” the other


the first

is
bette
r as measured by
-
2ll, the second is
be
tter as measured by parsimony (number

of parameters).

Howe
ver, in
comparing a model with
-
2ll of 1000 and
ten

parameters

to a model with
-
2ll of 1010 and

11 parameters, the first model
“do
minates” the second
. The first model

is not worse on all objectives (
-
2ll and
number

parameters), and better on at
least one (or in this case, both).

Given this approach, non
-
dominated “fronts” can be defined. The first

front is the set of non
-
dominated models. The
second front is the set of non
-
dominated model
s

after models in the first front are removed from the solution set, and so
on. In this way, models in

the solution set can be ranked

based on
of which front the
y are a member
. Given this ranking,
genetic algorithm (in this case, binary tournament based) can be applied to search for the set of optimal (non
-
dominated)
models in the solution space. Models that enter into the next generation are then selected us
ing

binary tournament
selection

based on the rank.

For binary tournament, each mode
l is selected in random sequence

and compared to another
randomly selected model. If the ranks of the models are different, the model with the lower rank is the winner

and is

selected as a “parent” for the next generation
. If the ranks are t
he same, but one model dominates

the other
, the dominant
model is selected
.

If the ranks are the same but

neither model dominates
the other, the model

that is less “crowded” is
selected.
Crowding is a function of how close other models are to the present model in the front. In order to preserve a
distribution of models across the front, models that are less crowded are preferred.

This process of selecting each model
randomly and comparin
g it to another random model is repeated

a second time

in order to preserve the number of models
in the
next
generation.

Two “parents” are selected in this way (with
two binary tournaments
). These p
arents are then
used to create

the next generation, afte
r crossing over and mutating the bit strings. The specific algorithm used in this
example is NSGA
-
II (
N
on
-
dominated,
S
orted
G
enetic
A
lgorithm)
3
. This process is repeated until the

non
-
dominated set
appears

stable.
The
pareto set of
models in the non
-
domi
nated fronts (1 or more non
-
dominated fronts) at the conclusion
of the optimization can then be presented to the user for additional evaluation (based on biological plausibility,
diagnostic plots
,

etc
.
).

Results:
Figure 1

shows

a
plot of the values o
f
-
2ll vs
.

number

of parameters

in the first front

for the 1
st
, 5
th
,

15
th

and 16
th

generations.

This

plot

shows

an essential trade
off in modeling


goodness of fit vs. parsimony. It can be
s
een that in iterating

from generation
s

1 to
5

to 15 the
first front (
set of
non
-
dominated models) improves

in the
value of
-
2ll and/or number

of parameters (shifting left
and down, lower
-
2ll and fewer parameters). Fro
m
generation 15 to generation 16

little additional
improvement is seen
,

suggestin
g that convergence has
occurred, with the non
-
dominated solution set being
identified. Also note that anywhere along this front, the
slope (or
-
1*slope,
the decrease

in
-
2ll/additional
parameter
) exceed
s

the AIC value
for non
-
hierarchical
models.


Figure
2 shows a similar plot for
number of parameters
vs
.


log(NPDE glo
bal p value). The p

value for NPDE is
presented as

-
log() for convenience in plotting only. The
tradeoff between NPDE and number of parameters is not
as clear as
is seen
for
-
2ll vs
.

number

of parameters, but a
s
ubtle relationship is still observed
, with more complex
models (more parameters) tending to have a better p
value for NPDE.

Interesti
ngly, the NPDE p

value falls

(larger

log(NPDE)
)

at the largest values for number of
parameters,
pe
rhaps suggesting
overparameterization.

Conclusions:
A multi
-
objective evolutionary algorithm
is capable of identifying a pareto set of non
-
dominated
po
pulation pk models. T
he objectives use
d are
-
2ll,
number

of parameters,

“quality” of the model results

a
nd
the global p value from NPDE
. Additional objectives
that could be added include simul
ation based metrics
such as PPC
. The results of this pareto
search

can then
be presented to the user, possibly sorted by other criteria (AIC, DIC, NPDE, successful co
nvergence, successful
covariance etc), with appropriate graphics generated for each, to be used for additional model selection.

References:

[1]


Manuale di economia politica,

Vilfredo Pareto,

1906


[2
]
Robert R. Bies
,
Matthew F. Muldoon
,
Bruce G. Pollock
,
Steven

Manuck
,
Gwenn Smith

and
Mark E. Sale
.
A Genetic
Algorithm
-
Based, Hybrid Machine Learning Approach to Model Selection
Journal of Pharmacokinetics and
Pharmacodynamics
,
33, (2
), 195
-
221, 2006

[3
] Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. A Fast and Elitist Multi
-
objective Genetic
Algorithm. N
SGA
-
II, IEEE Transactions on Evolutionary Computation, 6, (2), 2002

[4
]
Emmanuelle Comets, France Mentré. Using simulations
-
based metrics to detect model misspecifications, PAGE
meeting 2010



Figure 1


Figure 2