首页 最优设计

最优设计

举报
开通vip

最优设计 A New Approach to the Construction of Optimal Designs* R. H. Hardin N. J. A. Sloane AT&T Bell Laboratories Murray Hill, New Jersey 07974 Present address: AT&T Shannon Labs, Florham Park, NJ 07932 ABSTRACT By combining a modified version of Hooke and Jeev...

最优设计
A New Approach to the Construction of Optimal Designs* R. H. Hardin N. J. A. Sloane AT&T Bell Laboratories Murray Hill, New Jersey 07974 Present address: AT&T Shannon Labs, Florham Park, NJ 07932 ABSTRACT By combining a modified version of Hooke and Jeeves’ pattern search with exact or Monte Carlo moment calculations, it is possible to find I-, D- and A-optimal (or nearly optimal) designs for a wide range of response-surface problems. The algorithm routinely handles problems involving the minimization of functions of 1000 variables, and so for example can construct designs for a full quadratic response-surface depending on 12 continuous process variables. The algorithm handles continuous or discrete variables, linear equality or inequality constraints, and a response surface that is any low degree polynomial. The design may be required to include a specified set of points, so a sequence of designs can be obtained, each optimal given that the earlier runs have been made. The modeling region need not coincide with the measurement region. The algorithm has been implemented in a program called gosset, which has been used to compute extensive tables of designs. Many of these are more efficient than the best designs previously known. ∗A different version of this paper appeared in the Journal of Statistical Planning and Inference, Vol. 37 (1993), 339–369. It is also DIMACS Technical Report 93–47, August 1993. A New Approach to the Construction of Optimal Designs R. H. Hardin N. J. A. Sloane AT&T Bell Laboratories Murray Hill, New Jersey 07974 USA 1. Introduction This paper presents a new algorithm for constructing experimental designs. The algorithm, described in Section 3, has the following features. a. The variables may be discrete or continuous (or both— none of these choices are mutually exclusive), discrete variables may be numeric or symbolic, continuous variables may range over a cube or a ball, and the variables may be required to satisfy linear equality or inequality constraints (so mixtures and constrained mixtures can be handled). b. The model to be fitted may be any low degree polynomial (for example a quadratic response-surface). c. The number of runs is specified by the user (so minimal designs present no difficulty). d. The design may be required to include a specified set of points (so a sequence of designs can be found, each of which is optimal given that the earlier measurements have been made). e. The region where the model is to be fitted need not be the same as the region where measurements are to be made. (So for example designs can be optimized for modeling over a continuous region even if the measurements are constrained to a discrete set. Designs for extrapolation can be obtained in a similar way.) f. The algorithm is capable of minimizing any differentiable optimality criterion. We have focused on four such criteria, I-, A-, D- and E-optimality, with a distinct preference for the first, which minimizes the average prediction variance (see Section 2). The I- optimality criterion requires knowledge of the moments of the design region, and the algorithm finds these moments either from the exact formulae or by Monte Carlo esti- mation. g. The user has control over how much effort is expended by the algorithm, and can if desired monitor the progress of the search. It is not necessary to specify initial points for the search. h. The algorithm can be used in situations in which the errors, rather than being indepen- dent, have a known correlation matrix. The algorithm appears to be powerful enough to find optimal or nearly optimal designs involving as many as 1000 continuous variables, for example a full quadratic response-surface design depending on 12 process variables. Its effectiveness decreases as the number of discrete variables increases. However, the algorithm has found D-optimal main-effect designs involving 20 2-level factors (i.e. with 420 discrete variables). No individual ingredient of our algorithm is new. But although computers have been used by many people to construct experimental designs (surveys of such work can be found for example in [13, Chap. 15], [21], [27], [70], [78], [79], [87], [94], [100]), we believe that no algorithm comparable to ours is presently available. Implementation. We have implemented the algorithm in the C language in a program called gosset. A built-in parser permits a very flexible input. A brief description is given in Section 4, and further details can be found in the users’ manual [50]. The program is named after the amateur mathematician Thorold Gosset (1869–1962), who was one of the first to study polytopes in six, seven and eight dimensions [23, p. 164], and his contemporary, the statistician William Seally Gosset (1876–1937), who was one of the first to use statistical methods in the planning and interpretation of agricultural experiments [82]. Although from our geometric viewpoint their work is related, we do not know if the paths of Thorold (Cambridge, London, lawyer), and William Seally (Oxford, Dublin, brewer) ever crossed. Applications. So far there have been two main uses for gosset. (i) We have attempted to construct optimal (especially I-optimal) designs for a number of “classical” situations, for example linear, quadratic or cubic response-surface designs with k continuous variables in a cube or ball with n runs, over quite a large range of values of k and n, typically 1 ≤ k ≤ 12 and n ranging from the minimal value to 6 (or more) greater than the minimal value. We have also computed designs for similar models and regions in which 2 the variables are discrete. An extensive library of these designs is now built into gosset. Our work on these “classical” problems can be regarded as an attempt to provide optimal “exact” designs with small numbers of runs to complement the “asymptotic” designs of Kiefer et al. [38], [39]–[47],[63]–[68]. As we shall see in Section 5, some of these designs overlap with and improve on designs already available in the literature. Other designs we have found will be published elsewhere (see [51], [52], [53], [54], [92]). We have also used this collection of designs as data for theoretical investigations. Two results are worth mentioning here. (a) There is a simple lower bound on the average variance of an I-optimal design for quadratic models with n measurements in a k-dimensional ball (see Eq. (13) below). A number of interesting designs meet the bound. There is a similar bound for D-optimal designs. (b) It is known that for large numbers of runs D- and G-optimal designs are equivalent [65]. The results of Section 5 show that I-optimal designs are strictly different. For example, I-optimal designs make more measurements at the center of the region and fewer at the boundary. For quadratic models in a k-dimensional ball, k large, an I-optimal design makes about 4/k2 of the measurements at the center of the sphere, compared with about 2/k2 for D- and G-optimal designs. I-optimality also appears to be a more strict condition than D-optimality. In situations where the criteria produce similar designs (such as certain linear designs, see Section 5.3), we commonly find that although I- optimal designs are D-optimal, the converse is not necessarily true. (ii) We have constructed designs for a number of industrial applications, for instance problem involving continuous and discrete variables simultaneously, with linear constraints on the variables (see Section 5.4). We must emphasize that the theoretical justification for our designs (see Section 2) de- pends strongly on the validity of the particular model being used. We are assuming that the investigator has carried out some exploratory investigations (for example a screening design) and has identified a region where it is plausible to describe the response by a low degree poly- nomial. (Of course the design points when the initial measurements have been made can be incorporated by our program in the next design — see Section 4.) 3 This work began in 1990 when a statistician in Seattle, David H. Doehlert, wrote to one of us asking if we could construct designs for a full quadratic response-surface depending on k variables in a sphere, where k is between 3 and 14, and in which the number of runs is minimal or close to minimal. We had been using the pattern search algorithm in studying the Tammes problem of placing M points on a k-dimensional sphere so they are well-separated [55], and we found that a similar approach could be used for constructing designs. 2. Choice of optimality criterion Extensive discussions of the relative merits of A-, D-, E-, G- and I-optimality and of the dangers of relying on any single numerical criterion have appeared in the literature (see for example [13], [15], [48], [64], [87]), and our treatment will be brief. Suppose for concreteness that we wish to construct a design for a full quadratic response- surface model y = β0 + k∑ i=1 βixi + k∑ i=1 βiix 2 i + k−1∑ i=1 k∑ j=i+1 βijxixj + ² , (1) where there are k variables x1, . . . , xk, p = (k + 1)(k + 2)/2 unknown coefficients β, and the errors ² are independent with mean 0 and variance σ2. Let the design consist of n ≥ p points [xj1, . . . , xjk] for 1 ≤ j ≤ n , (2) chosen from a certain region of measurement (or operability) O. Let X be the n× p expanded design matrix, containing one row f(x) = [1, x1, . . . , xk, x 2 1, . . . , x 2 k, x1x2, . . . , xk−1xk] for each design point x = [x1, . . . , xk], and let MX = 1 n X ′X denote the matrix of moments of the design measure (the prime indicates matrix transposition). The prediction variance at an arbitrary point x is var ŷ(x) = σ2 n f(x)M−1X f(x) ′ . We define an I-optimal design (following Box and Draper [11], [12] and others) to be one which minimizes the normalized average or integrated prediction variance I = n σ2 ∫ R var ŷ(x)dµ(x) , (3) 4 where R is the region of interest (or modeling region), and µ is uniform measure on R with total measure 1. This integral simplifies (cf. [12], p. 341) to give I = trace{MM−1X } , (4) where M = ∫ R f(x)′f(x)dµ(x) (5) is the moment matrix of the region of interest. In contrast, A-, D-, E- and G-optimal designs are those which minimize A = trace M−1X , (6) D = {det MX}−1/p , (7) E = maximal eigenvalue of M−1X , (8) G = maximal value of var f̂(x), x ∈ R (9) respectively. We shall refer to the quantities defined in (4), (6)–(9) as the I-, A-, etc. values of the design. As the number of runs n → ∞, these quantities approach limits I∞, A∞, etc., and the I-, A-, etc. efficiencies of the design are given by I∞ I , A∞ A , D∞ D , E∞ E , G∞ G (10) (cf. [2], [70]). We are most interested in minimizing the prediction variance, var f̂(x), for x ∈ R, which suggests the use of I- or G-optimality. On the other hand our algorithm requires that the criterion be differentiable, which holds for A-, D- and I- but not E- or G-optimality. Fur- thermore we wish to be able to construct designs for which the measurement region O and modeling region R are distinct, which also picks out the I- or G-criteria (since the A-, D- and E-criteria do not involve R). We have therefore chosen I-optimality as our primary criterion. However, the implementation of the algorithm allows the user to search for any of I-, A-, D- or E-optimal designs. It is worth mentioning that the I-value (Eq. (4)) is a dimensionless quantity. Also, provided the model has the property that if it contains one monomial of degree d then it contains all possible monomials of degree ≤ d, then the I-value is unchanged if the variables are rescaled or if an orthogonal transformation is applied to the variables. In contrast the A- and E-values 5 of a design depend on the particular choice of coordinate axes used, a property which seems unsatisfactory. In the present paper we shall concentrate on I- and D-efficiencies. For the ball or cube it is not difficult to determine I∞ and D∞ exactly. In other cases we determined them empirically, by estimating the limiting values of I and D as the number of runs increases. We attempted to calculate I∞ and D∞ to at least three decimal places of accuracy. However, inaccuracies in I∞ and D∞ are not too significant, because they do not change the relative efficiencies of comparable designs. Note that a design is I- (or D-) optimal if it has the highest I- (or D-) efficiency for a given number of runs, and so a design can be optimal without being 100% efficient. To guard against the dangers of using any single number as a design criterion, we have plotted variance dispersion graphs (Giovannitti-Jensen and Myers [48], Vining [96]) for our designs; some examples will be found in [51]. These graphs show how the prediction variance var f̂(x) varies over the region of interest, and also give the G-value of the design. We conclude this section by briefly mentioning some earlier papers that look for I-optimal designs. Studden [95] gives a method for constructing approximations to I-optimal designs for 1- dimensional problems. Haines [49], Meyer and Nachtsheim [72], Crary [24] and Snow and Crary [93] all use sim- ulated annealing to search for I-optimal designs. This approach seems to be restricted to low dimensions and small numbers of design points, and furthermore is less successful at finding optimal designs than our approach. For example, gosset finds slightly different designs in two of the four cases shown in Figure 1 of [24]. Meyer and Nachtsheim [72] give several quite small examples where simulated annealing was unable to find the best designs known. One of these, a 17-run four-dimensional example, is described in Section 5.2 below. Crary’s [24] program, I-OPT, like ours, allows the measurement and modeling regions to be distinct. I-OPT has a feature that gosset does not have at present, namely the ability to specify relative weights for points in the modeling region. 3. The algorithm For a given model (such as (1)) involving k variables and p unknown coefficients, a design consists of n points (2) (not necessarily distinct) in the region O. Let x be a vector specifying the coordinates of all the design points. We wish to choose the design vector x so as to 6 minimize a certain differentiable function F (x). For I-optimal designs, F is the normalized average prediction variance I = trace MM−1X , where M is the moment matrix of the region R (see (4), (5)), while for A- or D-optimality F is trace M−1X or {det MX}−1/p (see (6), (7)). We let g(x) denote the gradient ∇F (x), normalized to have length √n. The regions O and R can be quite complicated. The present implementation (see Section 4) permits O and R to be a product of cubes, balls and finite sets, possibly intersected by hyperplanes and half-spaces. This includes simplices, of course, since these are intersections of cubes and hyperplanes. If there are no equality constraints in the problem then the design vector x is simply a concatenation of the individual points (2) of the design. When equality constraints are present we choose new, internal, variables to parametrize the space O, and then x is a concatenation of the coordinates of the internal variables. If the region O is a simple connected space such as a ball, our basic strategy is to minimize F using a modified version of the Hooke and Jeeves [59] pattern search. An excellent description is given by Beightler et al. [7], and our discussion here will be brief. We have modified the algorithm to make use of the gradient of F . Roughly speaking, pattern search minimizes F by accelerating down the gradient, and setting the velocity to zero whenever the function does not improve. The velocity increases as long as improvements continue. We begin with an initial design vector x = x(0), with velocity v(0) = 0. This initial design may be obtained by choosing n random points from O, although in the implementation the user has the option of specifying x(0) himself. We attempt to define the next vector by x(i+1) = x(i) + v(i+1) , (11) where the velocity vector v(i+1) is given by v(i+1) = v(i) − sg(x(i)) , (12) for i ≥ 0, where s is the current step size. If F (x(i+1)) < F (x(i)) the step is a success, we accept this value for x(i+1), and repeat the iteration with s multiplied by 1.04. Otherwise we set v(i) = 0 and try (11) and (12) again. If there is still no reduction in F , we divide the step size by 2, and if it is larger than some small accuracy limit, try (11) and (12) again. Otherwise we terminate this attempt, pick another starting design x(0), and repeat the whole procedure. Any particular sequence F (x(0)), F (x(1)), F (x(2)), . . . (a decreasing, hence bounded, 7 sequence of positive numbers) is terminated either when the improvement is less than some small accuracy limit, or (very rarely) when the number of steps exceeds a specified limit. After a specified number of attempts the algorithm terminates, returning the best and the most recent second-best designs found. Equations (11) and (12) must be modified when the design vector x(n) involves points near the boundary of O (either the natural boundary, for instance when O is a ball, cube or simplex, or a boundary imposed by a constraint on the variables). When this happens, the components of the gradient vector crossing the boundary are simply zeroed before normalization, and if any point in x(n) nevertheless moves outside the boundary we move it to the closest point on the boundary. Incidentally, in the implementation the gradient is calculated from the exact formula for ∇F , rather than by numerical differentiation of F . We have found this method very robust and efficient, typically minimizing a function of N variables in N steps. The above procedure is used when all or most of the variables in the design are continuous. When many discrete variables are present we use a collection of heuristic algorithms (which do however work most of the time). Suppose for concreteness that there are k = 6 variables, the first four of which are contin- uous and the last two are discrete variables. Let the n points of the design be a = a1 · · · a6, b = b1 · · · b6, . . . , c = c1 · · · c6 , so that the design vector is x = ab · · · c. We first discuss the question of choosing the discrete coordinates a5, a6, b5, b6, . . . , c5, c6 of the initial design vector x (0). For this we use one (or more) of three strategies: (i) run sequentially through all possibilities in some fixed order, running through them all once before repeating any; (ii) choose them at random from the set of all discrete possibilities; (iii) choose them at random from the underlying continuous space. If constraints are present, in all three cases only points satisfying the constraints are used. Strategies (i) and (ii) are reasonable when the number of combinations of discrete variables is not much larger than the number of design points, otherwise (iii) appears to be more successful. In the implementation the user can specify the strategy to be used, or accept the default strategy (which depends on the type of problem). 8 Second, in the optimization step, we optimize (using the method described earlier in this section) either just the continuous components of x, keeping the discrete components fixed, or all components, treating the discrete components as continuous. If the latter strategy is adopted, a final step is needed to convert the discrete components (a5, a6, etc.) back to their permitted discrete values. For this discretization step we again use one of three methods, replacing each continuous component by: the closest discrete value, whichever of the two closest discrete values gives the smallest value of F , or whichever of the discrete values gives
本文档为【最优设计】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_986948
暂无简介~
格式:pdf
大小:342KB
软件:PDF阅读器
页数:40
分类:理学
上传时间:2011-05-12
浏览量:19