首页 2- Numerical optimization

2- Numerical optimization

2- Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.1 Golden Section Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.2 Fib...

Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.1 Golden Section Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.2 Fibonacci Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 Algorithms for unconstrained multivariable optimization . . . . . . . . . . . . . . . . . . . . . 9 1.2.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2.1.1 Line search algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.1.2 Trust region methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2.2 Newton methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2.2.1 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2.2.2 Geometrical interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2.2.3 Implementation aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.2.2.4 Modified Newton method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.2.3 Gradient methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.2.3.1 Directional derivative and the gradient . . . . . . . . . . . . . . . . . . . . . . 18 1.2.3.2 The method of steepest descent . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.2.4 Conjugate gradient methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.2.5 Quasi-Newton methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.2.6 Algorithms for minimization without derivatives . . . . . . . . . . . . . . . . . . . . . 26 1.2.6.1 Nelder-Mead method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.2.6.2 Rosenbrock method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3 Contents 4 Chapter 1 Numerical optimization 1.1 Algorithms for optimization of single-variable functions. Bracketing techniques Consider a single variable real-valued function f(x) for which it is required to find an optimum in an interval [a, b]. Among the algorithms for univariate optimization, golden section or Fibonacci search tech- niques are fast, accurate, robust and do not require derivatives, (Sinha, 2007; Press et al., 2007; Mathews, 2005). They assume that the optimum is bracketed in a finite interval and the function is unimodal on [a, b]. The algorithms will be described for finding a minimum, the problem of finding a maximum is similar and may be solved with a very slight alteration of the algorithm or by changing the sign of the function. A function f(x) is unimodal on an interval [a, b] if it has exactly one local minimum x∗ on the interval and is strictly decreasing for x < x∗ and strictly increasing for x > x∗. 0 1 2 3 4 5 6 7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 x f(x ) Figure 1.1: Unimodal function The general strategy of golden section and Fibonacci methods finds a minimum within [a, b], called the initial interval of uncertainty, by progressively reducing the interval. This can be accomplished by placing experiments and eliminating parts of the initial interval that do not contain the optimum. Consider the interval divided by two points x1 and x2 located at the same distance from the boundaries a and b, as shown in Figure 1.2, i.e. the distance from a to x2 is the same as the distance from x1 to b. The function value is evaluated in these two points and the interval is reduced from a to x1 if f(x1) ≤ f(x2), or from x2 to b, otherwise. The procedure is iterative and it stops when the length of the interval is smaller than a given tolerance ². 5 Chapter 1. Numerical optimization Figure 1.2: Bracketing the minimum 1.1.1 Golden Section Search The golden section search technique (Press et al., 2007; Mathews, 2005) evaluates function values at two interior points x1 and x2 chosen such that each one of them divides the interval in a ratio r, with 12 < r < 1 to preserve the order a < x1 < x2 < b. As shown in Figure 1.2, the distance d1 can be written as: d1 = x2 − a = b− x1 (1.1) and it is equal to a fraction r of the original interval length, b− a. This gives: x2 − a = r(b− a) (1.2) b− x1 = r(b− a) (1.3) We shall assume now that the function has a smaller value at x1 than x2, thus the new search inter- val will be [a, x2]. The demonstration is similar with the other case when f(x2) > f(x1). A point x3 is introduced within the new interval so it is located at the same distance, d2, from x2 as x1 is from a: d2 = x1 − a = x2 − x3 (1.4) The ratio r must remain constant on each new interval, which means: x1 − a = r(x2 − a) (1.5) x2 − x3 = r(x2 − a) (1.6) If x2 from (1.2) and x1 from (1.3) are replaced into (1.5), a value for r can be determined as follows: b− r(b− a)− a = r [a+ r(b− a)− a] (1.7) Rearranging, we get: (b− a)(1− r − r2) = 0 (1.8) and since the initial length of the interval is nonzero, the ratio r is one of the roots of 1− r − r2 = 0 (1.9) The solutions of (1.9) are: r1 = −1 +√5 2 = 0.618, r2 = −1−√5 2 (1.10) but only r1 obeys the initial requirement that 12 < r < 1. The golden section search guarantees that after each new interval reduction the minimum will be brack- eted in an interval having the length equal to 0.618 times the size of the preceeding one. Algorithm 1 summarizes the technique for finding the minimum of a single variable function f(x) within a given interval [a, b] and with a tolerance ε. 6 1.1. Optimization of single-variable functions Algorithm 1 Golden Section Search Define function f(x) Define boundaries a, b and tolerance ε d = b− a while b− a ≥ ε do d← d× 0.618 x1 ← b− d x2 ← a+ d if f(x1) ≤ f(x2) then b← x2 else a← x1 end if end while 1.1.2 Fibonacci Search It is a function optimization method similar to golden section search, to find the minimum or maximum of a unimodal function, f(x), over a closed interval, [a, b], (Jeter, 1986; Pierre, 1986; Luenberger, 2003). The Fibonacci search is the strategy that yields the smallest possible interval of uncertainty in which the optimum lies, after n tests are completed, (Pierre, 1986), or ensures the final interval is within the user- specified tolerance (ε) in a number n of iterations. The method is based in the Fibonacci numbers which are defined by: F0 = F1 = 1, Fk = Fk−1 + Fk−2, k = 2, 3, ... (1.11) The first numbers of this sequence are 1, 1, 2, 3, 5, 8, 13, ... In Figure 1.3 consider the procedure is at the iteration k. As in the golden section search we shall determine the ratio rk, which in this case is not constant, for a given number of iterations, n, (Mathews, 2005). Figure 1.3: Fibonacci search The function f is tested at x1k and x2k and the interval will be reduced around the minimum. Without loss of generality, assume the minimum value in at x1k, thus the new search interval is [ak+1 = ak, bk+1 = x2k]. The point x1k is relabelled as x2,k+1. At each iteration, the interval is reduced by a ratio 12 < rk < 1, so the points, so the distance dk is calculated as: dk = rk(bk − ak) = rkdk−1 (1.12) 7 Chapter 1. Numerical optimization and each new interval has the length dk. We shall obtain a relation between rk and rk+1 starting with the observation that: x2k − x1k = bk+1 − x2,k+1 (1.13) From Figure 1.3, the left hand side of the relation (1.13) can be written as: x2k − x1k = (bk − ak)− (x1k − ak)− (bk − x2k) = (bk − ak)− (1− rk)(bk − ak)− (1− rk)(bk − ak) = (bk − ak)(2rk − 1) (1.14) and the right hand side: bk+1 − x2,k+1 = (1− rk+1)(bk+1 − ak+1) = (1− rk+1)rk(bk − ak) (1.15) The relations (1.13), (1.14) and (1.15) give: 2rk − 1 = (1− rk+1)rk (1.16) or rk+1 = 1− rk rk (1.17) Now we introduce the Fibonacci numbers. Replace: rk = Fn−k Fn−k+1 (1.18) and get the following: rk+1 = 1− Fn−kFn−k+1 Fn−k Fn−k+1 = Fn−k−1 Fn−k (1.19) When k = 1, n− 1 the ratio rk will have the following values: r1 = Fn−1 Fn r2 = Fn−1 Fn ... (1.20) rn−2 = F2 F3 rn−1 = F1 F2 = 1 2 At the last step, n− 1, no new points can be added. The distance dn−1 = dn−2rn−1 = dn−2/2, i.e. at the step n− 1 the points x1,n−1 and x2,n−2 are equal and: rn = F0 F1 = 1 1 = 1 (1.21) The last interval obtained is dn = rn(bn − an) = rndn−1 = dn−1 and no other steps are possible. It may be written as: dn = rndn−1 = F0 F1 dn−1 = F0 F1 F1 F2 dn−1... = F0 F1 F1 F2 . . . Fn−2 Fn−1 Fn−1 Fn d1 = F0 Fn d0 = 1 Fn d0 (1.22) 8 1.2. Algorithms for unconstrained multivariable optimization The distance d0 is the initial length of the interval of uncertainty, d0 = b− a. If the minimizer is to be found with a tolerance of ε, then the length of the last interval must be: 1 Fn (b− a) < ε (1.23) and the necessary number of iterations to reach the minimum with a maximum error ε is the smaller integer that makes the following inequality true: Fn > b− a ε (1.24) The algorithm 2 describes the Fibonacci search. Algorithm 2 Fibonacci Search Define function f(x) Define boundaries a, b and tolerance ε Initialize the Fibonacci sequence F0 = F1 = 1 while Fn ≤ (b− a)/ε do Calculate Fibonacci numbers Fn = Fn−1 + Fn−2 end while d = b− a for k=1:n do d← d× Fn−k/Fn−k+1 x1 ← b− d x2 ← a+ d if f(x1) ≤ f(x2) then b← x2 else a← x1 end if end for If the number of iterations has to be determined from knowledge of the tolerance ², the Fibonacci num- bers are calculated and stored until the inequality (1.24) becomes true. If the number of iterations is given, a more convenient way to calculate the Fibonacci numbers would be the use of explicit formula: Fn = 1√ 5 [( 1 + √ 5 2 )n − ( 1−√5 2 )n] (1.25) Both algorithms for single-variable function optimization prsesented here can be applied when the function f in not differentiable. For a small number of iterations the Fibonacci method is more efficient since it provides the smallest final interval of uncertainty in a given number of steps. When n is large, the methods are almost identical because the ratio Fn−k/Fn−k+1 converge to the golden ratio 0.618, when k approaches ∞. 1.2 Algorithms for unconstrained multivariable optimization 1.2.1 Basic concepts Over the last fourty years many powerful algorithms have been developed for unconstrained function optimization. All of them require an initial estimate of the optimum point, denoted by x0, from knowledge 9 Chapter 1. Numerical optimization about the application and the data set. Beginning at x0, optimization algorithms generate a sequence of iterates xk, k = 1, 2, ..., that terminate when either no more progress can be made, or when it seems that a solution point has been approximated with sufficient accuracy. The decision of moving from one point xk to the next, is taken based on the information about the function f at xk and aiming to find a new iterate xk+1 with a lower function value than xk, (Snyman, 2005; Nocedal and Wright, 1999). The algorithms terminate when one, or all, of the following situations occur: • two successive points are closer than a specified tolerance ε1: ‖xk+1 − xk‖ < ε1 (1.26) • the function value calculated at two successive points does not vary more than a given tolerance ε2: |f(xk+1)− f(xk)| < ε2 (1.27) • the gradient of f is sufficiently close to 0: ‖∇f(xk)‖ < ε3 (1.28) where ‖X‖ is the Euclidian norm of a vector X = [x1 x2 ... xn]T , given by: ‖X‖ = √ x21 + x 2 2 + ...+ x2n (1.29) In general, there are two strategies from moving from one point to the next: line search and trust region. An extensive description of algorithms from both categories can be found in (Nocedal and Wright, 1999; Snyman, 2005; Conn et al., 2000). 1.2.1.1 Line search algorithms Searching along a line for a minimum point is a component part of most nonlinear programming algo- rithms. To initiate a line search with respect to a function f , two vectors must be specified: the initial point x0 and the direction d in which the search is to be made, (Luenberger, 2003). x1 x 2 f(x,y)=3x1 2 −sin(x2) x0 x n −1 −0.5 0 0.5 1 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 Figure 1.4: Sequence of steps for minimization of a two variable function Algorithms of this type calculate a new point xk+1 (Figure 1.4) from: xk+1 = xk + skdk (1.30) 10 1.2. Algorithms for unconstrained multivariable optimization where xk and dk are the current point and direction of search and sk is the step length which is to be controlled by solving an one-dimensional problem of minimizing: F (sk) = f(xk + skdk) (1.31) Large values of the step length may lead to the situation when the minimum point is not reached in a small number of iterations as shown in Figure 1.5. The function f is reduced at every step but with a very small amount compared to the step length. Figure 1.5: Large step size The general structure of a line search descent method is given in Algorithm 3. Different descent meth- ods within this class differ according to the way in which the descent directions are chosen. Another important consideration is the method by means of which the one-dimensional line search is performed, (Snyman, 2005). Algorithm 3 General scheme for line search algorithms Define function f(x) and tolerance ε Select a starting point x0 repeat Select a descent direction dk Perform a one-dimensional line search in the direction dk, i.e. min sk F (sk) = min sk f(xk + skdk) or compute a step length that provides a decrease in the value of f (e.g. using Armijo, Goldstein or Wolfe rules) Compute xk+1 = xk + skdk until one of the conditions (1.26), (1.27) or (1.28) are true One possibility to control the step size control is to implement a procedure for minimization of F (sk) based on golden ratio or Fibonacci search as presented in previous sections. Sophisticated algorithms for one-dimensional minimization may not be a reasonable choice because they would require a large number of function evaluations and thus, a significant increase of the compu- tation time. More practical strategies perform an inexact line search to identify a step length that achieves adequate reductions in f . Implementations of line search algorithms try out a sequence of candidates for sk stopping to accept one of these values when certain conditions are satisfied. A detailed description of Armijo, Goldstein and Wolfe conditions is given in (Nocedal and Wright, 1999; Snyman, 2005; Luenberger, 2003). 11 Chapter 1. Numerical optimization Algorithm 4 Backtracking line search Select an initial step length s0 (e.g. s0 = 1) and set i = 0 repeat i← i+ 1 Set si = τsi−1, where τ ∈ (0, 1), (e.g. τ = 12 ) until f(xk + sidk) < f(xk) Set sk = si A basic backtracking procedure for line search is given in algorithm 4, (Gould and Leyffer, 2002). For a given direction dk, at the current point xk, this algorithm will decrease the step size until the value of f at the next point is smaller than f(xk). It will prevent the step getting too small, but does not prevent large step sizes relative to the decrease of f . The Armijo condition is formulated such that it will provide a sufficient decrease in f . Consider the function F defined by (1.31). For a fixed α, 0 < α < 1, the step sk is considered to be not too large and gives a sufficient decrease of f if, (Luenberger, 2003; Gould and Leyffer, 2002): F (sk) ≤ F (0) + αF ′(0)sk (1.32) Condition (1.32) is illustrated in Figure 1.6. Figure 1.6: Armijo condition In terms of the original notation, the condition (1.32) is obtained from: F (0) = f(xk + 0 · dk) = f(xk) (1.33) F ′(sk) = dTk∇f(xk + skdk) (1.34) F ′(0) = dTk∇f(xk) (1.35) F (0) + αF ′(0)sk = f(xk) + αdTk∇f(xk)sk (1.36) and (1.32) becomes: f(xk + skdk) ≤ f(xk) + αdTk∇f(xk)sk (1.37) A backtracking-Armijo algorithm is given below. Another popular approaches to determine an appropriate step size are Goldstein and Wolfe tests, (Lu- enberger, 2003; Nocedal and Wright, 1999). The Goldstein test for the step length requires that for an α selected such that 0 < α < 12 : f(xk) + (1− α)dTk∇f(xk)sk ≤ f(xk + skdk) ≤ f(xk) + αdTk∇f(xk)sk (1.38) The Wolfe test require a condition additional to the relation (1.37) to be satisfied: f(xk + skdk) ≤ f(xk) + α1dTk∇f(xk)sk (1.39) dTk∇f(xk + skdk) ≥ α2dTk∇f(xk) (1.40) where 0 < α1 < α2 < 1. 12 1.2. Algorithms for unconstrained multivariable optimization Algorithm 5 Backtracking Armijo Select an initial step length s0 (e.g. s0 = 1), α ∈ (0, 1) and set i = 0 repeat i← i+ 1 Set si = τsi−1, where τ ∈ (0, 1), (e.g. τ = 12 ) until f(xk + sidk) ≤ f(xk) + αdTk∇f(xk)si Set sk = si 1.2.1.2 Trust region methods Another strategy, approximates the function f to be minimized, with a model function, mk in a neighbor- hood, N , named the trust region, around the current point xk. The subproblem which is to be solved now is minimizing the model function over N . If solution does not produce a sufficient decrease on f , the trust region is shrunk around xk. The model function mk is usually a quadratic form obtained from a two-term Taylor series approximation. The general algorithm can be summarized as: • Choose a maximum distance from the current point - the trust region radius • Find a direction and a step that produce the best improvement possible in this region • If the step is unsatisfactory, reduce the size of the trust region and start again Algorithms using trust region strategy are discussed in detail in (Nocedal and Wright, 1999; Conn et al., 2000). 1.2.2 Newton methods 1.2.2.1 The algorithm Newtons method (known also as Newton-Raphson method), with all its variations, is one of the most important in numerical unconstrained optimization. Basically, it is a numerical procedure for finding the roots of a real-valued function. It can also be used to find local extrema of functions using the necessary condition of unconstrained optimization for a point to be stationary. If f : Rn → R is a multivariable function to be minimized or maximized, a point x0 is a stationary point of f if the gradient calculated at this location is zero, ∇f(x0) = 0. Thus, Newton method can be applied to find the roots of ∇f(x). Assume the function f is twice differentiable and the Hessian H(x) is continuous. The elements of H(x) are the second differentials of the objective function Hij = ∂2f/∂xi∂xj . In case the Hessian is not available and its computation is difficult or not possible, other methods must be applied. Newton’s method for the minimization of f(x) can be derived assuming that, given xk, the next point xk+1 is obtained by minimizing a quadratic approximation of f . The Taylor expansion of f(xk+1) around xk is: f(xk+1) ≈ f(xk) + (xk+1 − xk)T∇f(xk) + 12(xk+1 − xk) TH(xk)(xk+1 − xk) (1.41) The stationary point of (1.41) is computed from setting the gradient of f(xk+1) equal to zero: ∇f(xk+1) = ∇f(xk) +H(xk)(xk+1 − xk) = 0 (1.42) The third term in (1.41) was differentiated with respect to xk+1 according to the rule: ∂xTAx ∂x = 2Ax, for A symmetrical (1.43) 13 Chapter 1. Numerical optimization From (1.42): xk+1 = xk −H(xk)−1∇f(xk) (1.44) which is the iterative scheme of the method. For a multivariable function f(x) the Newton method can be described as in algorithm 6. The stop criterion for this algorithm can be one or more of the conditions (1.26), (1.27) or (1.28), or, as resulted from the same relations, the norm of [H(xk)]−1∇f(xk) has to be sufficiently small: ||H(xk)−1∇f(xk)|| ≤ ε (1.45) Algorithm 6 Newton Search Method Define f(x) and a tolerance ε Compute the gradient ∇f(x) symbolically Compute the hessian matrix H(x) symbolically Choose an initial point x0 Set k = 0 while ||H(xk)−1∇f(xk)|| > ε do Calculate a new point xk+1 = xk −H(xk)−1∇f(xk) Set k ← k + 1 end while Note that for a single variable function f(x) the iterative scheme can be written as: xk+1 = xk − f ′(xk) f ′′(

                    本文档为【2- Numerical optimization】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，
                    图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。 
 该文档来自用户分享，如有侵权行为请发邮件ishare@vip.sina.com联系网站客服，我们会及时删除。

                    [版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。

                    本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。

                    网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。
                

下载需要：免费已有0 人下载

立即下载

2- Numerical optimization

你可能还喜欢