非线性最小二乘法

非线性最小二乘法 IMM METHODS FOR NON-LINEAR LEAST SQUARES PROBLEMS 2nd Edition, April 2004 K. Madsen, H.B. Nielsen, O. Tingleff Informatics and Mathematical Modelling Technical University of Denmark CONTENTS 1. INTRODUCTION AND DEFINITIONS . . . . . . . . . . . . . . . ...

IMM METHODS FOR NON-LINEAR LEAST SQUARES PROBLEMS 2nd Edition, April 2004 K. Madsen, H.B. Nielsen, O. Tingleff Informatics and Mathematical Modelling Technical University of Denmark CONTENTS 1. INTRODUCTION AND DEFINITIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2. DESCENT METHODS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1. The Steepest Descent method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2. Newton’s Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 2.3. Line Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4. Trust Region and Damped Methods . . . . . . . . . . . . . . . . . . . . . . . . 11 3. NON-LINEAR LEAST SQUARES PROBLEMS . . . . . . . . . . . . . . . . . . . . . 17 3.1. The Gauss–Newton Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2. The Levenberg–Marquardt Method . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3. Powell’s Dog Leg Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4. A Hybrid Method: L–M and Quasi–Newton. . . . . . . . . . . . . . . . .34 3.5. A Secant Version of the L–M Method . . . . . . . . . . . . . . . . . . . . . . 40 3.6. A Secant Version of the Dog Leg Method . . . . . . . . . . . . . . . . . . . 45 3.7. Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 1. INTRODUCTION AND DEFINITIONS In this booklet we consider the following problem, Definition 1.1. Least Squares Problem Find x⁄, a local minimizer for1) F (x) = 12 mX i=1 (fi(x)) 2 ; where fi : IRn 7! IR; i= 1; : : : ;m are given functions, and m‚n. Example 1.1. An important source of least squares problems is data fitting. As an example consider the data points (t1; y1); : : : ; (tm; ym) shown below t y Figure 1.1. Data points f(ti; yi)g (marked by +) and model M(x; t) (marked by full line.) Further, we are given a fitting model, M(x; t) = x3e x1t + x4e x2t : 1) The factor 1 2 in the definition of F (x) has no effect on x⁄. It is introduced for conve- nience, see page 18. 1. INTRODUCTION AND DEFINITIONS 2 The model depends on the parameters x = [x1; x2; x3; x4]>. We assume that there exists an xy so that yi = M(x y; ti) + "i ; where the f"ig are (measurement) errors on the data ordinates, assumed to be- have like “white noise”. For any choice of x we can compute the residuals fi(x) = yi ¡M(x; ti) = yi ¡ x3ex1ti ¡ x4ex2ti ; i= 1; : : : ;m : For a least squares fit the parameters are determined as the minimizer x⁄ of the sum of squared residuals. This is seen to be a problem of the form in Defini- tion 1.1 with n= 4. The graph of M(x⁄; t) is shown by full line in Figure 1.1. A least squares problem is a special variant of the more general problem: Given a function F : IRn 7!IR, find an argument of F that gives the minimum value of this so-called objective function or cost function. Definition 1.2. Global Minimizer Given F : IRn 7! IR. Find x+ = argminxfF (x)g : This problem is very hard to solve in general, and we only present meth- ods for solving the simpler problem of finding a local minimizer for F , an argument vector which gives a minimum value of F inside a certain region whose size is given by –, where – is a small, positive number. Definition 1.3. Local Minimizer Given F : IRn 7! IR. Find x⁄ so that F (x⁄) • F (x) for kx¡ x⁄k < – : In the remainder of this introduction we shall discuss some basic concepts in optimization, and Chapter 2 is a brief review of methods for finding a local 3 1. INTRODUCTION AND DEFINITIONS minimizer for general cost functions. For more details we refer to Frandsen et al (2004). In Chapter 3 we give methods that are specially tuned for least squares problems. We assume that the cost function F is differentiable and so smooth that the following Taylor expansion is valid,2) F (x+h) = F (x) + h>g + 1 2 h>H h +O(khk3) ; (1.4a) where g is the gradient, g · F 0(x) = 2666664 @F @x1 (x) . . . @F @xn (x) 3777775 ; (1.4b) and H is the Hessian, H · F 00(x) = • @2F @xi@xj (x) ‚ : (1.4c) If x⁄ is a local minimizer and khk is sufficiently small, then we cannot find a point x⁄+h with a smaller F -value. Combining this observation with (1.4a) we get Theorem 1.5. Necessary condition for a local minimizer. If x⁄ is a local minimizer, then g⁄ · F 0(x⁄) = 0 : We use a special name for arguments that satisfy the necessary condition: Definition 1.6. Stationary point. If gs · F 0(xs) = 0 ; then xs is said to be a stationary point for F . 2) Unless otherwise specified, k ¢ k denotes the 2-norm, khk = q h21 + ¢ ¢ ¢+ h2n. 1. INTRODUCTION AND DEFINITIONS 4 Thus, a local minimizer is also a stationary point, but so is a local maximizer. A stationary point which is neither a local maximizer nor a local minimizer is called a saddle point. In order to determine whether a given stationary point is a local minimizer or not, we need to include the second order term in the Taylor series (1.4a). Inserting xs we see that F (xs+h) = F (xs) + 12h >Hs h +O(khk3) with Hs = F 00(xs) : (1.7) From definition (1.4c) of the Hessian it follows that any H is a symmetric matrix. If we request that Hs is positive definite, then its eigenvalues are greater than some number – > 0 (see Appendix A), and h>Hs h > – khk2 : This shows that for khk sufficiently small the third term on the right-hand side of (1.7) will be dominated by the second. This term is positive, so that we get Theorem 1.8. Sufficient condition for a local minimizer. Assume that xs is a stationary point and that F 00(xs) is positive definite. Then xs is a local minimizer. If Hs is negative definite, then xs is a local maximizer. If Hs is indefinite (ie it has both positive and negative eigenvalues), then xs is a saddle point. 2. DESCENT METHODS All methods for non-linear optimization are iterative: From a starting point x0 the method produces a series of vectors x1;x2; : : :, which (hopefully) converges to x⁄, a local minimizer for the given function, see Definition 1.3. Most methods have measures which enforce the descending condition F (xk+1) < F (xk) : (2.1) This prevents convergence to a maximizer and also makes it less probable that we converge towards a saddle point. If the given function has several minimizers the result will depend on the starting point x0. We do not know which of the minimizers that will be found; it is not necessarily the mini- mizer closest to x0. In many cases the method produces vectors which converge towards the minimizer in two clearly different stages. When x0 is far from the solution we want the method to produce iterates which move steadily towards x⁄. In this “global stage” of the iteration we are satisfied if the errors do not increase except in the very first steps, ie kek+1k < kekk for k >K ; where ek denotes the current error, ek = xk ¡ x⁄ : (2.2) In the final stage of the iteration, where xk is close to x⁄, we want faster convergence. We distinguish between Linear convergence: kek+1k • akekk when kekk is small; 0 < a < 1 ; (2.3a) 2. DESCENT METHODS 6 Quadratic convergence: kek+1k = O(kekk2) when kekk is small ; (2.3b) Superlinear convergence: kek+1k=kekk ! 0 for k!1 : (2.3c) The methods presented in this lecture note are descent methods which sat- isfy the descending condition (2.1) in each step of the iteration. One step from the current iterate consists in 1. Find a descent direction hd (discussed below), and 2. find a step length giving a good decrease in the F -value. Thus an outline of a descent method is Algorithm 2.4. Descent method begin k := 0; x := x0; found := false fStarting pointg while (not found) and (k < kmax) hd := search direction(x) fFrom x and downhillg if (no such h exists) found := true fx is stationaryg else fi := step length(x;hd) ffrom x in direction hdg x := x + fihd; k := k+1 fnext iterateg end Consider the variation of the F -value along the half line starting at x and with direction h. From the Taylor expansion (1.4a) we see that F (x+fih) = F (x) + fih>F 0(x) +O(fi2) ’ F (x) + fih>F 0(x) for fi sufficiently small. (2.5) We say that h is a descent direction if F (x+fih) is a decreasing function of fi at fi= 0. This leads to the following definition. Administrator 线条 Administrator 线条 7 2. DESCENT METHODS Definition 2.6. Descent direction. h is a descent direction for F at x if h>F 0(x) < 0 : If no such h exists, then F 0(x) = 0, showing that in this case x is stationary. Otherwise, we have to choose fi, ie how far we should go from x in the direction given by hd, so that we get a decrease in the value of the objective function. One way of doing this is to find (an approximation to) fie = argminfi>0fF (x+fih)g : (2.7) The process is called line search, and is discussed in Section 2.3. First, however, we shall introduce two methods for computing a descent direction. 2.1. The Steepest Descent method From (2.5) we see that when we perform a step fih with positive fi, then the relative gain in function value satisfies lim fi!0 F (x)¡ F (x+fih) fikhk = ¡ 1 khk h >F 0(x) = ¡kF 0(x)k cos µ ; where µ is the angle between the vectors h and F 0(x). This shows that we get the greatest gain rate if µ=…, ie if we use the steepest descent direction hsd given by hsd = ¡F 0(x) : (2.8) The method based on (2.8) (ie hd = hsd in Algorithm 2.4) is called the steep- est descent method or gradient method. The choice of descent direction is “the best” (locally) and we could combine it with an exact line search (2.7). A method like this converges, but the final convergence is linear and often very slow. Examples in Frandsen et al (2004) show how the steepest descent method with exact line search and finite computer precision can fail to find the minimizer of a second degree polynomial. For many problems, however, the method has quite good performance in the initial stage of the iterative process. 2.2. Newton’s Method 8 Considerations like this has lead to the so-called hybrid methods, which – as the name suggests – are based on two different methods. One which is good in the initial stage, like the gradient method, and another method which is good in the final stage, like Newton’s method; see the next section. A major problem with a hybrid method is the mechanism which switches between the two methods when appropriate. 2.2. Newton’s Method We can derive this method from the condition that x⁄ is a stationary point. According to Definition 1.6 it satisfies F 0(x⁄) = 0. This is a nonlinear sys- tem of equations, and from the Taylor expansion F 0(x+h) = F 0(x) + F 00(x)h +O(khk2) ’ F 0(x) + F 00(x)h for khk sufficiently small we derive Newton’s method: Find hn as the solutions to H hn = ¡F 0(x) with H = F 00(x) ; (2.9a) and compute the next iterate by x := x + hn : (2.9b) Suppose that H is positive definite, then it is nonsingular (implying that (2.9a) has a unique solution), and u>H u> 0 for all nonzero u. Thus, by multiplying with h>n on both sides of (2.9a) we get 0 < h>n H hn = ¡h>n F 0(x) ; (2.10) showing that hn is a descent direction: it satisfies the condition in Defini- tion 2.6. Newton’s method is very good in the final stage of the iteration, where x is close to x⁄. One can show (see Frandsen et al (2004)) that if the Hessian at the solution is positive definite (the sufficient condition in Theorem 1.8 is satisfied) and if we are at a position inside the region around x⁄ where Administrator 线条 Administrator 线条 Administrator 线条 Administrator 线条 Administrator 线条 Administrator 线条 Administrator 线条 Administrator 线条 Administrator 线条 Administrator 线条 Administrator 线条 Administrator 线条 Administrator 线条 9 2. DESCENT METHODS F 00(x) is positive definite, then we get quadratic convergence (defined in (2.3)). On the other hand, if x is in a region where F 00(x) is negative definite everywhere, and where there is a stationary point, the basic Newton method (2.9) would converge (quadratically) towards this stationary point, which is a maximizer. We can avoid this by requiring that all steps taken are in descent directions. We can build a hybrid method, based on Newton’s method and the steepest descent method. According to (2.10) the Newton step is guaranteed to be downhill if F 00(x) is positive definite, so a sketch of the central section of this hybrid algorithm could be if F 00(x) is positive definite h := hn else h := hsd x := x + fih (2.11) Here, hsd is the steepest descent direction and fi is found by line search; see Section 2.3. A good tool for checking a matrix for positive definiteness is Cholesky’s method (see Appendix A) which, when successful, is also used for solving the linear system in question. Thus, the check for definiteness is almost for free. In Section 2.4 we introduce some methods, where the computation of the search direction hd and step length fi is done simultaneously, and give a version of (2.11) without line search. Such hybrid methods can be very efficient, but they are hardly ever used. The reason is that they need an im- plementation of F 00(x), and for complicated application problems this is not available. Instead we can use a so-called Quasi-Newton method, based on series of matrices which gradually approach H⁄= F 00(x⁄). In Section 3.4 we present such a method. Also see Chapter 5 in Frandsen et al (2004). 2.3. Line Search Given a point x and a descent direction h. The next iteration step is a move from x in direction h. To find out, how far to move, we study the variation of the given function along the half line from x in the direction h, 2.3. Line Search 10 ’(fi) = F (x+fih) ; x and h fixed; fi‚ 0 : (2.12) An example of the behaviour of ’(fi) is shown in Figure 2.1. α y y = φ(0) y = φ(α) Figure 2.1. Variation of the cost function along the search line. Our h being a descent direction ensures that ’ 0(0) = h>F 0(x) < 0 ; indicating that if fi is sufficiently small, we satisfy the descending condition (2.1), which is equivalent to ’(fi) < ’(0) : Often, we are given an initial guess on fi, eg fi= 1 with Newton’s method. Figure 2.1 illustrates that three different situations can arise 1– fi is so small that the gain in value of the objective function is very small. fi should be increased. 2– fi is too large: ’(fi)‚’(0). Decrease fi in order to satisfy the descent condition (2.1). 3– fi is close to the minimizer1) of ’(fi). Accept this fi-value. 1) More precisely: the smallest local minimizer of ’. If we increase fi beyond the interval shown in Figure 2.1, it may well happen that we get close to another local minimum for F . Administrator 线条 Administrator 线条 Administrator 线条 Administrator 线条 Administrator 线条 Administrator 线条 Administrator 线条 Administrator 线条 11 2. DESCENT METHODS An exact line search is an iterative process producing a series fi1; fi2 : : : . The aim is to find the true minimizer fie defined in (2.7), and the algorithm stops when the iterate fis satisfies j’0(fis)j • ¿ j’0(0)j ; where ¿ is a small, positive number. In the iteration we can use approxima- tions to the variation of ’(fi) based on the computed values of ’(fik) = F (x+fikh) and ’0(fik) = h>F 0(x+fikh) : See Sections 2.5 – 2.6 in Frandsen et al (2004) for details. Exact line search can waste much computing time: When x is far from x⁄ the search direction h may be far from the direction x⁄¡x, and there is no need to find the true minimum of ’ very accurately. This is the background for the so-called soft line search, where we accept an fi-value if it does not fall in the categories 1– or 2– listed above. We use a stricter version of the descending condition (2.1), viz ’(fis) • ’(0) + °1 ¢ ’0(0) ¢ fi with 0<°1< 1 : (2.13a) This ensures that we are not in case 2–. Case 1– corresponds to the point (fi;’(fi)) being too close to the starting tangent, and we supplement with the condition ’0(fis) ‚ °2 ¢ ’0(0) with °1 < °2 < 1 : (2.13b) If the starting guess on fi satisfies both these criteria, then we accept it as fis. Otherwise, we have to iterate as sketched for exact line search. Details can be seen in Section 2.5 of Frandsen et al (2004). 2.4. Trust Region and Damped Methods Assume that we have a model L of the behaviour of F in the neighbourhood of the current iterate x, F (x+h) ’ L(h) · F (x) + h>c + 1 2 h>B h ; (2.14) 2.4. Trust Region and Damped Methods 12 where c2 IRn and the matrix B2 IRn£n is symmetric. The basic ideas of this section may be generalized to other forms of the model, but in this booklet we only need the form of L given in (2.14). Typically, the model is a second order Taylor expansion of F around x, like the first three terms in the right- hand side of (1.4a), or L(h) may be an approximation to this expansion. It is generally true that such a model is good only when h is sufficiently small. We shall introduce two methods that include this aspect in the determination of a step h, which is a descent direction and which can be used with fi= 1 in Algorithm 2.4. In a trust region method we assume that we know a positive number ¢ such that the model is sufficiently accurate inside a ball with radius ¢, centered at x, and determine the step as h = htr · argminkhk•¢fL(h)g: (2.15) In a damped method the step is determined as h = hdm · argminhfL(h) + 12 „ h>hg; (2.16) where the damping parameter „ ‚ 0. The term 12 „h>h = 12 „khk2 is seen to penalize large steps. The central part of Algorithm 2.4 based on one of these methods has the form Compute h by (2.15) or (2.16) if F (x+h) < F (x) x := x + h Update ¢ or „ (2.17) This corresponds to fi= 1 if the step h satisfies the descending condition (2.1). Otherwise, fi= 0, ie we do not move.2) However, we are not stuck 2) There are versions of these methods that include a proper line search to find a point x+fih with smaller F -value, and information gathered during the line search is used in the updating of ¢ or „. For many problems such versions use fewer iteration steps but a larger accumulated number of function values. Administrator 线条 Administrator 线条 Administrator 线条 Administrator 线条 Administrator 线条 13 2. DESCENT METHODS at x (unless x = x⁄): by a proper modification of ¢ or „ we aim at having better luck in the next iteration step. Since L(h) is assumed to be a good approximation to F (x+h) for h suf- ficiently small, the reason why the step failed is that h was too large, and should be reduced. Further, if the step is accepted, it may be possible to use a larger step from the new iterate and thereby reduce the number of steps needed before we reach x⁄. The quality of the model with the computed step can be evaluated by the so-called gain ratio % = F (x)¡ F (x+h) L(0)¡ L(h) ; (2.18) ie the ratio between the actual and predicted decrease in function value. By construction the denominator is positive, and the numerator is negative if the step was not downhill – it was too large and should be reduced. With a trust region method we monitor the step length by the size of the radius ¢. The following updating strategy is widely used, if % < 0:25 ¢ := ¢=2 elseif % > 0:75 ¢ := maxf¢; 3 ⁄ khkg (2.19) Thus, if % < 1 4 , we decide to use smaller steps, while % > 3 4 indicates that it may be possible to use larger steps. A trust region algorithm is not sensitive to minor changes in the thresholds 0:25 and 0:75, the divisor p1 = 2 or the factor p2 = 3, but it is important that the numbers p1 and p2 are chosen so that the ¢-va

                    本文档为【非线性最小二乘法】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，
                    图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。 
 该文档来自用户分享，如有侵权行为请发邮件ishare@vip.sina.com联系网站客服，我们会及时删除。

                    [版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。

                    本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。

                    网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。
                

下载需要：免费已有0 人下载

立即下载

非线性最小二乘法

你可能还喜欢