Journal of Global Optimization 11: 341–359, 1997. 341
c
1997 Kluwer Academic Publishers. Printed in the Netherlands.
Differential Evolution – A Simple and Efficient
Heuristic for Global Optimization over Continuous
Spaces
RAINER STORN
Siemens AG, ZFE T SN2, Otto-Hahn Ring 6, D-81739 Muenchen, Germany.
(e-mail:rainer.storn@mchp.siemens.de)
KENNETH PRICE
836 Owl Circle, Vacaville, CA 95687, U.S.A. (email: kprice@solano.community.net)
(Received: 20 March 1996; accepted: 19 November 1996)
Abstract. A new heuristic approach for minimizing possibly nonlinear and non-differentiable con-
tinuous space functions is presented. By means of an extensive testbed it is demonstrated that the
new method converges faster and with more certainty than many other acclaimed global optimization
methods. The new method requires few control variables, is robust, easy to use, and lends itself very
well to parallel computation.
Key words: Stochastic optimization, nonlinear optimization, global optimization, genetic algorithm,
evolution strategy.
1. Introduction
Problems which involve global optimization over continuous spaces are ubiqui-
tous throughout the scientific community. In general, the task is to optimize certain
properties of a system by pertinently choosing the system parameters. For con-
venience, a system’s parameters are usually represented as a vector. The standard
approach to an optimization problem begins by designing an objective function
that can model the problem’s objectives while incorporating any constraints. Espe-
cially in the circuit design community, methods are in use which do not need an
objective function but operate with so-called regions of acceptability: Brayton et
al. (1981), Lueder (1990), Storn (1995). Although these methods can make for-
mulating a problem simpler, they are usually inferior to techniques which make
use of an objective function. Consequently, we will only concern ourselves with
optimization methods that use an objective function. In most cases, the objective
function defines the optimization problem as a minimization task. To this end, the
following investigation is further restricted to minimization problems. For such
problems, the objective function is more accurately called a “cost” function.
When the cost function is nonlinear and non-differentiable, direct search ap-
proaches are the methods of choice. The best known of these are the algorithms
FIRSTPROOF; PIPS No.: 136368 MATHKAP
jogo386.tex; 20/11/1997; 15:00; v.6; p.1
342 RAINER STORN AND KENNETH PRICE
by Nelder and Mead: Bunday et al. (1987), by Hooke and Jeeves: Bunday et al.
(1987), genetic algorithms (GAs): Goldberg (1989), and evolution strategies (ESs):
Rechenberg (1973), Schwefel (1995). Central to every direct search method is a
strategy that generates variations of the parameter vectors. Once a variation is gen-
erated, a decision must then be made whether or not to accept the newly derived
parameters. Most standard direct search methods use the greedy criterion to make
this decision. Under the greedy criterion, a new parameter vector is accepted if and
only if it reduces the value of the cost function. Although the greedy decision pro-
cess converges fairly fast, it runs the risk of becoming trapped in a local minimum.
Inherently parallel search techniques like genetic algorithms and evolution strate-
gies have some built-in safeguards to forestall misconvergence. By running several
vectors simultaneously, superior parameter configurations can help other vectors
escape local minima. Another method which can extricate a parameter vector from
a local minimum is Simulated Annealing: Ingber (1992), Ingber (1993), Press et
al. (1992). Annealing relaxes the greedy criterion by occasionally permitting an
uphill move. Such moves potentially allow a parameter vector to climb out of a
local minimum. As the number of iterations increases, the probability of accepting
a large uphill move decreases. In the long run, this leads to the greedy criterion.
While all direct search methods lend themselves to annealing, it has primarily been
used just for the Random Walk, which itself is the simplest case of an evolutionary
algorithm: Rechenberg (1973). Nevertheless, attempts have been made to anneal
other direct searches like the method of Nelder and Mead: Press et al. (1992) and
genetic algorithms: Ingber (1993), Price (1994). Users generally demand that a
practical minimization technique should fulfill five requirements:
(1) Ability to handle non-differentiable, nonlinear and multimodal cost functions.
(2) Parallelizability to cope with computation intensive cost functions.
(3) Ease of use, i.e. few control variables to steer the minimization. These variables
should also be robust and easy to choose.
(4) Good convergence properties, i.e. consistent convergence to the global mini-
mum in consecutive independent trials.
As explained in the following the novel minimization method Differential Evolu-
tion (DE) was designed to fulfill all of the above requirements.
To fulfill requirement (1) DE was designed to be a stochastic direct search
method. Direct search methods also have the advantage of being easily applied
to experimental minimization where the cost value is derived from a physical
experiment rather than a computer simulation. Physical experiments were indeed
the motivation for the development of ESs by Rechenberg et al.
Requirement (2) is important for computationally demanding optimizations
where, for example, one evaluation of the cost function might take from minutes to
hours, as is often the case in integrated circuit design or finite element simulation.
In order to obtain usable results in a reasonable amount of time, the only viable
approach is to resort to a parallel computer or a network of computers. DE fulfills
jogo386.tex; 20/11/1997; 15:00; v.6; p.2
DIFFERENTIAL EVOLUTION 343
requirement (2) by using a vector population where the stochastic perturbation of
the population vectors can be done independently.
In order to satisfy requirement (3) it is advantageous if the minimization method
is self-organizing so that very little input is required from the user. The method
of Nelder and Mead: Bunday et al. (1987), is a good example of a self-organizing
minimizer. If the cost function at hand has D parameters Nelder and Mead’s
method uses a polyhedron with D+1 vertices to define the current search space.
Each vertex is represented by a D-dimensional parameter vector which samples
the cost function. New parameter vectors are generated by so-called reflections
of the vectors with highest cost and contractions around low cost vectors. The
new vectors replace their predecessors if they correspond to a function value
with reduced cost compared to their predecessors. This strategy allows the search
space, i.e. the polyhedron, to expand and contract without special control variable
settings by the user. Unfortunately, Nelder & Mead’s method is basically a local
minimization method, which, as our experiments suggest, is not powerful enough
for global minimization tasks, even if the concept of annealing is introduced. Yet
DE borrows the idea from Nelder & Mead of employing information from within
the vector population to alter the search space. DE’s self-organizing scheme takes
the difference vector of two randomly chosen population vectors to perturb an
existing vector. The perturbation is done for every population vector. This crucial
idea is in contrast to the method used by traditional ESs in which predetermined
probability distribution functions determine vector perturbations.
Last but not least, the good convergence properties demanded in requirement
(4) are mandatory for a good minimization algorithm. Although many approaches
exist to theoretically describe the convergence properties of a global minimization
method, only extensive testing under various conditions can show whether a mini-
mization method can fulfill its promises. DE scores very well in this regard as will
be explained in detail in Section 3.
2. Differential Evolution
Differential Evolution (DE) is a parallel direct search method which utilizes NP
D-dimensional parameter vectors
x
i;G
; i = 1; 2; . . . ; NP (1)
as a population for each generation G. NP does not change during the minimization
process. The initial vector populationis chosen randomly and should cover the entire
parameter space. As a rule, we will assume a uniform probability distribution for
all random decisions unless otherwise stated. In case a preliminary solution is
available, the initial population might be generated by adding normally distributed
random deviations to the nominal solution xnom;0. DE generates new parameter
vectors by adding the weighted difference between two population vectors to a
third vector. Let this operation be called mutation. The mutated vector’s parameters
jogo386.tex; 20/11/1997; 15:00; v.6; p.3
344 RAINER STORN AND KENNETH PRICE
Figure 1. An example of a two-dimensional cost function showing its contour lines and the
process for generating v
i;G+1.
are then mixed with the parameters of another predetermined vector, the target
vector, to yield the so-called trial vector. Parameter mixing is often referred to as
“crossover” in the ES-community and will be explained later in more detail. If the
trial vector yields a lower cost function value than the target vector, the trial vector
replaces the target vector in the following generation. This last operation is called
selection. Each population vector has to serve once as the target vector so that NP
competitions take place in one generation.
More specifically DE’s basic strategy can be described as follows:
Mutation
For each target vector x
i;G
; i = 1; 2; 3; . . . ; NP, a mutant vector is generated
according to
v
i;G+1 = xr1;G + F � (xr2;G � xr3;G) (2)
with random indexes r1; r2; r3 2 f1; 2; . . . ;NPg, integer, mutually different and
F > 0. The randomly chosen integers r1, r2 and r3 are also chosen to be different
from the running index i, so that NP must be greater or equal to four to allow for this
condition. F is a real and constant factor 2 [0, 2] which controls the amplification
of the differential variation (x
r2;G � xr3;G). Figure 1 shows a two-dimensional
example that illustrates the different vectors which play a part in the generation of
v
i;G+1.
Crossover
In order to increase the diversity of the perturbed parameter vectors, crossover is
introduced. To this end, the trial vector:
u
i;G+1 = (u1i;G+1; u2i;G+1; . . . ; uDi;G+1) (3)
jogo386.tex; 20/11/1997; 15:00; v.6; p.4
DIFFERENTIAL EVOLUTION 345
Figure 2. Illustration of the crossover process for D = 7 parameters.
is formed, where
u
ji;G+1 =
�
v
ji;G+1 if (randb(j) � CR) or j = rnbr(i)
x
ji;G
if (randb(j) > CR) and j 6= rnbr(i) ;
j = 1; 2; . . . ;D: (4)
In (4), randb(j) is the jth evaluation of a uniform random number generator with
outcome 2 [0; 1]. CR is the crossover constant 2 [0; 1] which has to be determined
by the user. rnbr(i) is a randomly chosen index 2 1; 2; :::;D which ensures that
u
i;G+1 gets at least one parameter from vi;G+1. Figure 2 gives an example of the
crossover mechanism for 7-dimensional vectors.
Selection
To decide whether or not it should become a member of generation G+ 1, the trial
vector u
i;G+1 is compared to the target vector xi;G using the greedy criterion. If
vector u
i;G+1 yields a smaller cost function value than xi;G, then xi;G+1 is set to
u
i;G+1; otherwise, the old value xi;G is retained.
Pseudocode
The simplicity of DE is best explained via some C-style pseudocode which is given
in Figure 3.
Other variants of DE
The above scheme is not the only variant of DE which has proven to be useful. In
order to classify the different variants, the notation:
DE=x=y=z
is introduced where
jogo386.tex; 20/11/1997; 15:00; v.6; p.5
346 RAINER STORN AND KENNETH PRICE
Figure 3. The DE search engine in 19 lines of C-style pseudocode.
x specifies the vector to be mutated which currently can be “rand” (a randomly
chosen population vector) or “best” (the vector of lowest cost from the current
population).
y is the number of difference vectors used.
z denotes the crossover scheme. The current variant is “bin” (Crossover due to
independent binomial experiments as explained in Section 2)
Using this notation, the basic DE-strategy described in the previous chapter can
be written as: DE/rand/1/bin. This is the DE-variant we used for all performance
comparisons later on. Nevertheless, one highly beneficial method that deserves
special mention is the method DE/best/2/bin: Price (1996), where
v
i;G+1 = x
best;G
+ F � (x
r1;G + xr2;G � xr3;G � xr4;G): (5)
jogo386.tex; 20/11/1997; 15:00; v.6; p.6
DIFFERENTIAL EVOLUTION 347
The usage of two difference vectors seems to improve the diversity of the population
if the number of population vectors NP is high enough.
3. Comparison with Other Minimization Methods
DE has already participated in the First International IEEE Competition on Evo-
lutionary Optimization, the 1st ICEO: Price, K. and Storn, R. (1996). Among
conference entries, DE proved to be the fastest evolutionary algorithm, although it
did place third in speed behind two deterministic methods of limited application.
Encouraged by these results, we looked for more minimization methods which
claim to work effectively on real functions. The number of different minimization
methods published is enormous: Schwefel (1995), so we confined ourselves to the
most prominent ones.
3.1. ANNEALING METHODS
Two annealing methods were chosen to compete with DE, the Annealed Nelder and
Mead strategy (ANM): Press (1992), and Adaptive Simulated Annealing (ASA):
Ingber (1993). We used readily available source code to test both annealing algo-
rithms on a testbed which we think is challenging for global optimization methods.
The first method, ANM, is appealing because of its adaptive scheme for gener-
ating random parameter deviations. When the annealing part is switched off, a fast
converging direct search method remains which is especially useful in cases where
local minimization suffices. The basic control variables in ANM are T, the starting
temperature, TF, the temperature reduction factor and NV, the number of random
variations at a given temperature level.
The second method, ASA, claims to converge very quickly and to outperform
GAs on the De Jong test suite: Ingber (1992). Although ASA provides more
than a dozen control variables, it turned out that just two of them, TEMPERA-
TURE RATIO SCALE (TRS) and TEMPERATURE ANNEAL SCALE (TAS),
had significant impact on the minimization process.
Testbed #1
Our function testbed contains the De Jong test functions in a slightly modified
fashion plus some additional functions which exhibit further distinctive difficulties
for a global minimization algorithm. For all functions an initial parameter range,
IPR, and a value to reach, VTR, were defined. At the beginning of the optimization
the initial parameter values are drawn randomly from the IPR. If a minimizer gets
below the VTR, the problem is assumed to be solved.
(1) First De Jong function (sphere)
f1(x) =
3
X
j=1
x
2
j
; IPR:x
j
2 [�5:12; 5:12] (6)
jogo386.tex; 20/11/1997; 15:00; v.6; p.7
348 RAINER STORN AND KENNETH PRICE
f1(x) is considered to be a very simple task for every serious minimization
method. The global minimum is f1(0) = 0 and the VTR was set to 1.e�6.
(2) Second De Jong function (Rosenbrock’s saddle)
f2(x) = 100 � (x21 � x2)2 + (1 � x1)2; IPR:xj 2 [�2:048; 2:048] (7)
Although f2(x) has just two parameters, it has the reputation of being a difficult
minimization problem. The global minimum is f2(1)=0 and VTR=1.e�6.
(3) Modified third De Jong function (step)
f3(x) =
(
30:+
P5
j=1bxjc 8xj 2 IPR
Q
(x
j
62IPR)^(x
j
<0) 30 � sgn(�xj � 5:12)
;
IPR:x
j
2 [�5:12; 5:12] (8)
The global minimum is f3(�5 � ") = 0 where " 2 [0,0.12]. Again, the
VTR was chosen to be 1.e�6.The step function exhibits many plateaus which
pose a considerable problem for many minimization algorithms. The original
step function does not contain the product of signum functions which causes
the minimum to occur at minus infinity. In order to define a unique global
minimum, however, the signum functions were included.
(4) Modified fourth De Jong function (quartic)
f4(x) =
30
X
j=1
(j � x
4
j
+ �); IPR:x
j
2 [�1:28; 1:28] (9)
This function is designed to test the behavior of a minimization algorithm in
the presence of noise. In the original De Jong function, � is a random variable
produced by Gaussian noise having the distribution N(0; 1). According to
Ingber (1992), this function appears to be flawed as no definite global minimum
exists. In response to the problem, we followed the suggestion given in Ingber
(1992) and chose � to be a random variable with uniform distribution and
bounded by [0,1). In contrast to the original version of De Jong’s quartic
function, we have also included � inside the summation instead of just adding
� to the summation result. This change makes f4(x) more difficult to minimize.
The functional minimum is f4(0) � 30E[�] = 15 = VTR, where E[�] is the
expectation of �.
(5) Fifth De Jong function (Shekel’s Foxholes)
f5(x) =
1
0:002 +
24
X
i=0
1
i+
2
X
j=1
(x
j
� a
ij
)
6
;
IPR: x
j
2 [�65:536; 65:536] (10)
with a
i1 = f�32;�16; 0; 16; 32g for i = 0; 1; 2; 3; 4 and ai1 = aimod5;1 as
well as a
i2 = f�32;�16; 0; 16; 32g for i = 0; 5; 10; 15; 20 and ai2 = a
i+k;2,
k = 1; 2; 3; 4
jogo386.tex; 20/11/1997; 15:00; v.6; p.8
DIFFERENTIAL EVOLUTION 349
The global minimum for this function is f6(�32;�32) �= 0:998004, the VTR
was defined to be 0.998005.
(6) Corana’s parabola: Ingber (1993), Corana et al. (1987).
f6(x) =
4
X
j=1
(
0:15 � (z
j
� 0:05 � sgn(z
j
))
2
� d
j
if jx
j
� z
j
j < 0:05
d
j
� x
2
j
otherwise ;
IPR:x
j
2 [�1000; 1000] (11)
with
z
j
=
�
�
�
�
�
x
j
0:2
�
�
�
�
+ 0:49999
�
� sgn(x
j
) � 0:2
and
d
j
= f1; 1000; 10; 100g
f6(x) defines a paraboloid whose axes are parallel to the coordinate axes. It
is riddled with a set of holes that increase in depth the closer one approaches
the origin. Any minimization algorithm that goes strictly downhill will almost
always be captured by the holes. The global minimum here is f6(x) = 0, with
jx
j
j < 0:05, j = 1; 2; 3; 4. The VTR was defined to be 1.e�6.
(7) Griewangk’s function: Griewangk (1981).
f7(x) =
10
X
j=1
x
2
j
4000 �
10
Y
j=1
cos
�
x
j
p
j
�
+ 1; IPR:x
j
2 [�400; 400] (12)
Like test function f6(x), f7(x) has many local minima so that it is very difficult
to find the true minimum f7(0) = 0. The VTR was defined to be 1.e�6.
(8) Zimmermann’s problem: Zimmermann (1990).
Let
h1(x) = 9 � x1 � x2; IPR: xj 2 [0; 100]; j = 1; 2 (13)
h2(x) = (x1 � 3)2 + (x2 � 2)2 � 16; (14)
h3(x) = x1 � x2 � 14 (15)
and
p(�) = 100 � (1 + �): (16)
Then
f8(x) = maxfh1(x); p(h2(x)) � sgn(h2(x)); p(h3(x)) � sgn(h3(x));
p(�x1) � sgn(�x1); p(�x2) � sgn(�x2)g: (17)
Finding the global minimum f8(7; 2) = 0 poses a special problem, because
the minimum is located at the corner of the constrained region defined by
(x1 � 3)2 + (x2 � 2)2 � 16 and x1 � x2 � 14. The VTR was defined to be
1.e�6.
(9) Polynomial fitting problem by Storn and Price.
Let
h4(x; z) =
2k
X
j=0
x
j+1 � z
j
; k integer and > 0; (18)
have the coefficients x
j+1 such that
h4(x; z) 2 [�1; 1] for z 2 [�1; 1] (19)
jogo386.tex; 20/11/1997; 15:00; v.6; p.9
350 RAINER STORN AND KENNETH PRICE
and
h4(x; z) � T2k(1:2) for z = �1:2 (20)
with T2k(z) being a Chebychev Polynomial of degree 2k. The Chebychev Poly-
nomials are defined recursively according to the difference equationT
n+1(z) =
2z � T
n
(z)� T
n�1(z), n integer and > 0, with the initial conditions T0(z) = 1
and T1(z) = z. The solution to the polynomial fitting problem is, of course,
h4(x; z) = T2k(z), a polynomial
本文档为【差分进化算法Differential Evolution】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。