差分进化算法Differential Evolution

差分进化算法Differential Evolution Journal of Global Optimization 11: 341–359, 1997. 341 c 1997 Kluwer Academic Publishers. Printed in the Netherlands. Differential Evolution – A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces RAINER STORN Siemens AG, ZFE T SN...

Journal of Global Optimization 11: 341–359, 1997. 341 c 1997 Kluwer Academic Publishers. Printed in the Netherlands. Differential Evolution – A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces RAINER STORN Siemens AG, ZFE T SN2, Otto-Hahn Ring 6, D-81739 Muenchen, Germany. (e-mail:rainer.storn@mchp.siemens.de) KENNETH PRICE 836 Owl Circle, Vacaville, CA 95687, U.S.A. (email: kprice@solano.community.net) (Received: 20 March 1996; accepted: 19 November 1996) Abstract. A new heuristic approach for minimizing possibly nonlinear and non-differentiable con- tinuous space functions is presented. By means of an extensive testbed it is demonstrated that the new method converges faster and with more certainty than many other acclaimed global optimization methods. The new method requires few control variables, is robust, easy to use, and lends itself very well to parallel computation. Key words: Stochastic optimization, nonlinear optimization, global optimization, genetic algorithm, evolution strategy. 1. Introduction Problems which involve global optimization over continuous spaces are ubiqui- tous throughout the scientific community. In general, the task is to optimize certain properties of a system by pertinently choosing the system parameters. For con- venience, a system’s parameters are usually represented as a vector. The standard approach to an optimization problem begins by designing an objective function that can model the problem’s objectives while incorporating any constraints. Espe- cially in the circuit design community, methods are in use which do not need an objective function but operate with so-called regions of acceptability: Brayton et al. (1981), Lueder (1990), Storn (1995). Although these methods can make for- mulating a problem simpler, they are usually inferior to techniques which make use of an objective function. Consequently, we will only concern ourselves with optimization methods that use an objective function. In most cases, the objective function defines the optimization problem as a minimization task. To this end, the following investigation is further restricted to minimization problems. For such problems, the objective function is more accurately called a “cost” function. When the cost function is nonlinear and non-differentiable, direct search ap- proaches are the methods of choice. The best known of these are the algorithms FIRSTPROOF; PIPS No.: 136368 MATHKAP jogo386.tex; 20/11/1997; 15:00; v.6; p.1 342 RAINER STORN AND KENNETH PRICE by Nelder and Mead: Bunday et al. (1987), by Hooke and Jeeves: Bunday et al. (1987), genetic algorithms (GAs): Goldberg (1989), and evolution strategies (ESs): Rechenberg (1973), Schwefel (1995). Central to every direct search method is a strategy that generates variations of the parameter vectors. Once a variation is gen- erated, a decision must then be made whether or not to accept the newly derived parameters. Most standard direct search methods use the greedy criterion to make this decision. Under the greedy criterion, a new parameter vector is accepted if and only if it reduces the value of the cost function. Although the greedy decision pro- cess converges fairly fast, it runs the risk of becoming trapped in a local minimum. Inherently parallel search techniques like genetic algorithms and evolution strate- gies have some built-in safeguards to forestall misconvergence. By running several vectors simultaneously, superior parameter configurations can help other vectors escape local minima. Another method which can extricate a parameter vector from a local minimum is Simulated Annealing: Ingber (1992), Ingber (1993), Press et al. (1992). Annealing relaxes the greedy criterion by occasionally permitting an uphill move. Such moves potentially allow a parameter vector to climb out of a local minimum. As the number of iterations increases, the probability of accepting a large uphill move decreases. In the long run, this leads to the greedy criterion. While all direct search methods lend themselves to annealing, it has primarily been used just for the Random Walk, which itself is the simplest case of an evolutionary algorithm: Rechenberg (1973). Nevertheless, attempts have been made to anneal other direct searches like the method of Nelder and Mead: Press et al. (1992) and genetic algorithms: Ingber (1993), Price (1994). Users generally demand that a practical minimization technique should fulfill five requirements: (1) Ability to handle non-differentiable, nonlinear and multimodal cost functions. (2) Parallelizability to cope with computation intensive cost functions. (3) Ease of use, i.e. few control variables to steer the minimization. These variables should also be robust and easy to choose. (4) Good convergence properties, i.e. consistent convergence to the global mini- mum in consecutive independent trials. As explained in the following the novel minimization method Differential Evolu- tion (DE) was designed to fulfill all of the above requirements. To fulfill requirement (1) DE was designed to be a stochastic direct search method. Direct search methods also have the advantage of being easily applied to experimental minimization where the cost value is derived from a physical experiment rather than a computer simulation. Physical experiments were indeed the motivation for the development of ESs by Rechenberg et al. Requirement (2) is important for computationally demanding optimizations where, for example, one evaluation of the cost function might take from minutes to hours, as is often the case in integrated circuit design or finite element simulation. In order to obtain usable results in a reasonable amount of time, the only viable approach is to resort to a parallel computer or a network of computers. DE fulfills jogo386.tex; 20/11/1997; 15:00; v.6; p.2 DIFFERENTIAL EVOLUTION 343 requirement (2) by using a vector population where the stochastic perturbation of the population vectors can be done independently. In order to satisfy requirement (3) it is advantageous if the minimization method is self-organizing so that very little input is required from the user. The method of Nelder and Mead: Bunday et al. (1987), is a good example of a self-organizing minimizer. If the cost function at hand has D parameters Nelder and Mead’s method uses a polyhedron with D+1 vertices to define the current search space. Each vertex is represented by a D-dimensional parameter vector which samples the cost function. New parameter vectors are generated by so-called reflections of the vectors with highest cost and contractions around low cost vectors. The new vectors replace their predecessors if they correspond to a function value with reduced cost compared to their predecessors. This strategy allows the search space, i.e. the polyhedron, to expand and contract without special control variable settings by the user. Unfortunately, Nelder & Mead’s method is basically a local minimization method, which, as our experiments suggest, is not powerful enough for global minimization tasks, even if the concept of annealing is introduced. Yet DE borrows the idea from Nelder & Mead of employing information from within the vector population to alter the search space. DE’s self-organizing scheme takes the difference vector of two randomly chosen population vectors to perturb an existing vector. The perturbation is done for every population vector. This crucial idea is in contrast to the method used by traditional ESs in which predetermined probability distribution functions determine vector perturbations. Last but not least, the good convergence properties demanded in requirement (4) are mandatory for a good minimization algorithm. Although many approaches exist to theoretically describe the convergence properties of a global minimization method, only extensive testing under various conditions can show whether a mini- mization method can fulfill its promises. DE scores very well in this regard as will be explained in detail in Section 3. 2. Differential Evolution Differential Evolution (DE) is a parallel direct search method which utilizes NP D-dimensional parameter vectors x i;G ; i = 1; 2; . . . ; NP (1) as a population for each generation G. NP does not change during the minimization process. The initial vector populationis chosen randomly and should cover the entire parameter space. As a rule, we will assume a uniform probability distribution for all random decisions unless otherwise stated. In case a preliminary solution is available, the initial population might be generated by adding normally distributed random deviations to the nominal solution xnom;0. DE generates new parameter vectors by adding the weighted difference between two population vectors to a third vector. Let this operation be called mutation. The mutated vector’s parameters jogo386.tex; 20/11/1997; 15:00; v.6; p.3 344 RAINER STORN AND KENNETH PRICE Figure 1. An example of a two-dimensional cost function showing its contour lines and the process for generating v i;G+1. are then mixed with the parameters of another predetermined vector, the target vector, to yield the so-called trial vector. Parameter mixing is often referred to as “crossover” in the ES-community and will be explained later in more detail. If the trial vector yields a lower cost function value than the target vector, the trial vector replaces the target vector in the following generation. This last operation is called selection. Each population vector has to serve once as the target vector so that NP competitions take place in one generation. More specifically DE’s basic strategy can be described as follows: Mutation For each target vector x i;G ; i = 1; 2; 3; . . . ; NP, a mutant vector is generated according to v i;G+1 = xr1;G + F � (xr2;G � xr3;G) (2) with random indexes r1; r2; r3 2 f1; 2; . . . ;NPg, integer, mutually different and F > 0. The randomly chosen integers r1, r2 and r3 are also chosen to be different from the running index i, so that NP must be greater or equal to four to allow for this condition. F is a real and constant factor 2 [0, 2] which controls the amplification of the differential variation (x r2;G � xr3;G). Figure 1 shows a two-dimensional example that illustrates the different vectors which play a part in the generation of v i;G+1. Crossover In order to increase the diversity of the perturbed parameter vectors, crossover is introduced. To this end, the trial vector: u i;G+1 = (u1i;G+1; u2i;G+1; . . . ; uDi;G+1) (3) jogo386.tex; 20/11/1997; 15:00; v.6; p.4 DIFFERENTIAL EVOLUTION 345 Figure 2. Illustration of the crossover process for D = 7 parameters. is formed, where u ji;G+1 = � v ji;G+1 if (randb(j) � CR) or j = rnbr(i) x ji;G if (randb(j) > CR) and j 6= rnbr(i) ; j = 1; 2; . . . ;D: (4) In (4), randb(j) is the jth evaluation of a uniform random number generator with outcome 2 [0; 1]. CR is the crossover constant 2 [0; 1] which has to be determined by the user. rnbr(i) is a randomly chosen index 2 1; 2; :::;D which ensures that u i;G+1 gets at least one parameter from vi;G+1. Figure 2 gives an example of the crossover mechanism for 7-dimensional vectors. Selection To decide whether or not it should become a member of generation G+ 1, the trial vector u i;G+1 is compared to the target vector xi;G using the greedy criterion. If vector u i;G+1 yields a smaller cost function value than xi;G, then xi;G+1 is set to u i;G+1; otherwise, the old value xi;G is retained. Pseudocode The simplicity of DE is best explained via some C-style pseudocode which is given in Figure 3. Other variants of DE The above scheme is not the only variant of DE which has proven to be useful. In order to classify the different variants, the notation: DE=x=y=z is introduced where jogo386.tex; 20/11/1997; 15:00; v.6; p.5 346 RAINER STORN AND KENNETH PRICE Figure 3. The DE search engine in 19 lines of C-style pseudocode. x specifies the vector to be mutated which currently can be “rand” (a randomly chosen population vector) or “best” (the vector of lowest cost from the current population). y is the number of difference vectors used. z denotes the crossover scheme. The current variant is “bin” (Crossover due to independent binomial experiments as explained in Section 2) Using this notation, the basic DE-strategy described in the previous chapter can be written as: DE/rand/1/bin. This is the DE-variant we used for all performance comparisons later on. Nevertheless, one highly beneficial method that deserves special mention is the method DE/best/2/bin: Price (1996), where v i;G+1 = x best;G + F � (x r1;G + xr2;G � xr3;G � xr4;G): (5) jogo386.tex; 20/11/1997; 15:00; v.6; p.6 DIFFERENTIAL EVOLUTION 347 The usage of two difference vectors seems to improve the diversity of the population if the number of population vectors NP is high enough. 3. Comparison with Other Minimization Methods DE has already participated in the First International IEEE Competition on Evo- lutionary Optimization, the 1st ICEO: Price, K. and Storn, R. (1996). Among conference entries, DE proved to be the fastest evolutionary algorithm, although it did place third in speed behind two deterministic methods of limited application. Encouraged by these results, we looked for more minimization methods which claim to work effectively on real functions. The number of different minimization methods published is enormous: Schwefel (1995), so we confined ourselves to the most prominent ones. 3.1. ANNEALING METHODS Two annealing methods were chosen to compete with DE, the Annealed Nelder and Mead strategy (ANM): Press (1992), and Adaptive Simulated Annealing (ASA): Ingber (1993). We used readily available source code to test both annealing algo- rithms on a testbed which we think is challenging for global optimization methods. The first method, ANM, is appealing because of its adaptive scheme for gener- ating random parameter deviations. When the annealing part is switched off, a fast converging direct search method remains which is especially useful in cases where local minimization suffices. The basic control variables in ANM are T, the starting temperature, TF, the temperature reduction factor and NV, the number of random variations at a given temperature level. The second method, ASA, claims to converge very quickly and to outperform GAs on the De Jong test suite: Ingber (1992). Although ASA provides more than a dozen control variables, it turned out that just two of them, TEMPERA- TURE RATIO SCALE (TRS) and TEMPERATURE ANNEAL SCALE (TAS), had significant impact on the minimization process. Testbed #1 Our function testbed contains the De Jong test functions in a slightly modified fashion plus some additional functions which exhibit further distinctive difficulties for a global minimization algorithm. For all functions an initial parameter range, IPR, and a value to reach, VTR, were defined. At the beginning of the optimization the initial parameter values are drawn randomly from the IPR. If a minimizer gets below the VTR, the problem is assumed to be solved. (1) First De Jong function (sphere) f1(x) = 3 X j=1 x 2 j ; IPR:x j 2 [�5:12; 5:12] (6) jogo386.tex; 20/11/1997; 15:00; v.6; p.7 348 RAINER STORN AND KENNETH PRICE f1(x) is considered to be a very simple task for every serious minimization method. The global minimum is f1(0) = 0 and the VTR was set to 1.e�6. (2) Second De Jong function (Rosenbrock’s saddle) f2(x) = 100 � (x21 � x2)2 + (1 � x1)2; IPR:xj 2 [�2:048; 2:048] (7) Although f2(x) has just two parameters, it has the reputation of being a difficult minimization problem. The global minimum is f2(1)=0 and VTR=1.e�6. (3) Modified third De Jong function (step) f3(x) = ( 30:+ P5 j=1bxjc 8xj 2 IPR Q (x j 62IPR)^(x j <0) 30 � sgn(�xj � 5:12) ; IPR:x j 2 [�5:12; 5:12] (8) The global minimum is f3(�5 � ") = 0 where " 2 [0,0.12]. Again, the VTR was chosen to be 1.e�6.The step function exhibits many plateaus which pose a considerable problem for many minimization algorithms. The original step function does not contain the product of signum functions which causes the minimum to occur at minus infinity. In order to define a unique global minimum, however, the signum functions were included. (4) Modified fourth De Jong function (quartic) f4(x) = 30 X j=1 (j � x 4 j + �); IPR:x j 2 [�1:28; 1:28] (9) This function is designed to test the behavior of a minimization algorithm in the presence of noise. In the original De Jong function, � is a random variable produced by Gaussian noise having the distribution N(0; 1). According to Ingber (1992), this function appears to be flawed as no definite global minimum exists. In response to the problem, we followed the suggestion given in Ingber (1992) and chose � to be a random variable with uniform distribution and bounded by [0,1). In contrast to the original version of De Jong’s quartic function, we have also included � inside the summation instead of just adding � to the summation result. This change makes f4(x) more difficult to minimize. The functional minimum is f4(0) � 30E[�] = 15 = VTR, where E[�] is the expectation of �. (5) Fifth De Jong function (Shekel’s Foxholes) f5(x) = 1 0:002 + 24 X i=0 1 i+ 2 X j=1 (x j � a ij ) 6 ; IPR: x j 2 [�65:536; 65:536] (10) with a i1 = f�32;�16; 0; 16; 32g for i = 0; 1; 2; 3; 4 and ai1 = aimod5;1 as well as a i2 = f�32;�16; 0; 16; 32g for i = 0; 5; 10; 15; 20 and ai2 = a i+k;2, k = 1; 2; 3; 4 jogo386.tex; 20/11/1997; 15:00; v.6; p.8 DIFFERENTIAL EVOLUTION 349 The global minimum for this function is f6(�32;�32) �= 0:998004, the VTR was defined to be 0.998005. (6) Corana’s parabola: Ingber (1993), Corana et al. (1987). f6(x) = 4 X j=1 ( 0:15 � (z j � 0:05 � sgn(z j )) 2 � d j if jx j � z j j < 0:05 d j � x 2 j otherwise ; IPR:x j 2 [�1000; 1000] (11) with z j = � � � � � x j 0:2 � � � � + 0:49999 � � sgn(x j ) � 0:2 and d j = f1; 1000; 10; 100g f6(x) defines a paraboloid whose axes are parallel to the coordinate axes. It is riddled with a set of holes that increase in depth the closer one approaches the origin. Any minimization algorithm that goes strictly downhill will almost always be captured by the holes. The global minimum here is f6(x) = 0, with jx j j < 0:05, j = 1; 2; 3; 4. The VTR was defined to be 1.e�6. (7) Griewangk’s function: Griewangk (1981). f7(x) = 10 X j=1 x 2 j 4000 � 10 Y j=1 cos � x j p j � + 1; IPR:x j 2 [�400; 400] (12) Like test function f6(x), f7(x) has many local minima so that it is very difficult to find the true minimum f7(0) = 0. The VTR was defined to be 1.e�6. (8) Zimmermann’s problem: Zimmermann (1990). Let h1(x) = 9 � x1 � x2; IPR: xj 2 [0; 100]; j = 1; 2 (13) h2(x) = (x1 � 3)2 + (x2 � 2)2 � 16; (14) h3(x) = x1 � x2 � 14 (15) and p(�) = 100 � (1 + �): (16) Then f8(x) = maxfh1(x); p(h2(x)) � sgn(h2(x)); p(h3(x)) � sgn(h3(x)); p(�x1) � sgn(�x1); p(�x2) � sgn(�x2)g: (17) Finding the global minimum f8(7; 2) = 0 poses a special problem, because the minimum is located at the corner of the constrained region defined by (x1 � 3)2 + (x2 � 2)2 � 16 and x1 � x2 � 14. The VTR was defined to be 1.e�6. (9) Polynomial fitting problem by Storn and Price. Let h4(x; z) = 2k X j=0 x j+1 � z j ; k integer and > 0; (18) have the coefficients x j+1 such that h4(x; z) 2 [�1; 1] for z 2 [�1; 1] (19) jogo386.tex; 20/11/1997; 15:00; v.6; p.9 350 RAINER STORN AND KENNETH PRICE and h4(x; z) � T2k(1:2) for z = �1:2 (20) with T2k(z) being a Chebychev Polynomial of degree 2k. The Chebychev Poly- nomials are defined recursively according to the difference equationT n+1(z) = 2z � T n (z)� T n�1(z), n integer and > 0, with the initial conditions T0(z) = 1 and T1(z) = z. The solution to the polynomial fitting problem is, of course, h4(x; z) = T2k(z), a polynomial

                    本文档为【差分进化算法Differential Evolution】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，
                    图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。 
 该文档来自用户分享，如有侵权行为请发邮件ishare@vip.sina.com联系网站客服，我们会及时删除。

                    [版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。

                    本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。

                    网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。
                

下载需要：免费已有0 人下载

立即下载

差分进化算法Differential Evolution

你可能还喜欢