A New Approach to the Construction of Optimal Designs*
R. H. Hardin
N. J. A. Sloane
AT&T Bell Laboratories
Murray Hill, New Jersey 07974
Present address: AT&T Shannon Labs, Florham Park, NJ 07932
ABSTRACT
By combining a modified version of Hooke and Jeeves’ pattern search with exact or Monte Carlo
moment calculations, it is possible to find I-, D- and A-optimal (or nearly optimal) designs
for a wide range of response-surface problems. The algorithm routinely handles problems
involving the minimization of functions of 1000 variables, and so for example can construct
designs for a full quadratic response-surface depending on 12 continuous process variables. The
algorithm handles continuous or discrete variables, linear equality or inequality constraints, and
a response surface that is any low degree polynomial. The design may be required to include
a specified set of points, so a sequence of designs can be obtained, each optimal given that the
earlier runs have been made. The modeling region need not coincide with the measurement
region. The algorithm has been implemented in a program called gosset, which has been used
to compute extensive tables of designs. Many of these are more efficient than the best designs
previously known.
∗A different version of this paper appeared in the Journal of Statistical Planning and Inference, Vol. 37
(1993), 339–369. It is also DIMACS Technical Report 93–47, August 1993.
A New Approach to the Construction of Optimal Designs
R. H. Hardin
N. J. A. Sloane
AT&T Bell Laboratories
Murray Hill, New Jersey 07974 USA
1. Introduction
This paper presents a new algorithm for constructing experimental designs. The algorithm,
described in Section 3, has the following features.
a. The variables may be discrete or continuous (or both— none of these choices are mutually
exclusive), discrete variables may be numeric or symbolic, continuous variables may range
over a cube or a ball, and the variables may be required to satisfy linear equality or
inequality constraints (so mixtures and constrained mixtures can be handled).
b. The model to be fitted may be any low degree polynomial (for example a quadratic
response-surface).
c. The number of runs is specified by the user (so minimal designs present no difficulty).
d. The design may be required to include a specified set of points (so a sequence of designs
can be found, each of which is optimal given that the earlier measurements have been
made).
e. The region where the model is to be fitted need not be the same as the region where
measurements are to be made. (So for example designs can be optimized for modeling
over a continuous region even if the measurements are constrained to a discrete set.
Designs for extrapolation can be obtained in a similar way.)
f. The algorithm is capable of minimizing any differentiable optimality criterion. We have
focused on four such criteria, I-, A-, D- and E-optimality, with a distinct preference
for the first, which minimizes the average prediction variance (see Section 2). The I-
optimality criterion requires knowledge of the moments of the design region, and the
algorithm finds these moments either from the exact formulae or by Monte Carlo esti-
mation.
g. The user has control over how much effort is expended by the algorithm, and can if
desired monitor the progress of the search. It is not necessary to specify initial points
for the search.
h. The algorithm can be used in situations in which the errors, rather than being indepen-
dent, have a known correlation matrix.
The algorithm appears to be powerful enough to find optimal or nearly optimal designs
involving as many as 1000 continuous variables, for example a full quadratic response-surface
design depending on 12 process variables. Its effectiveness decreases as the number of discrete
variables increases. However, the algorithm has found D-optimal main-effect designs involving
20 2-level factors (i.e. with 420 discrete variables).
No individual ingredient of our algorithm is new. But although computers have been
used by many people to construct experimental designs (surveys of such work can be found
for example in [13, Chap. 15], [21], [27], [70], [78], [79], [87], [94], [100]), we believe that no
algorithm comparable to ours is presently available.
Implementation. We have implemented the algorithm in the C language in a program called
gosset. A built-in parser permits a very flexible input. A brief description is given in Section 4,
and further details can be found in the users’ manual [50].
The program is named after the amateur mathematician Thorold Gosset (1869–1962), who
was one of the first to study polytopes in six, seven and eight dimensions [23, p. 164], and
his contemporary, the statistician William Seally Gosset (1876–1937), who was one of the first
to use statistical methods in the planning and interpretation of agricultural experiments [82].
Although from our geometric viewpoint their work is related, we do not know if the paths
of Thorold (Cambridge, London, lawyer), and William Seally (Oxford, Dublin, brewer) ever
crossed.
Applications. So far there have been two main uses for gosset.
(i) We have attempted to construct optimal (especially I-optimal) designs for a number
of “classical” situations, for example linear, quadratic or cubic response-surface designs with
k continuous variables in a cube or ball with n runs, over quite a large range of values of k
and n, typically 1 ≤ k ≤ 12 and n ranging from the minimal value to 6 (or more) greater than
the minimal value. We have also computed designs for similar models and regions in which
2
the variables are discrete. An extensive library of these designs is now built into gosset. Our
work on these “classical” problems can be regarded as an attempt to provide optimal “exact”
designs with small numbers of runs to complement the “asymptotic” designs of Kiefer et al.
[38], [39]–[47],[63]–[68].
As we shall see in Section 5, some of these designs overlap with and improve on designs
already available in the literature. Other designs we have found will be published elsewhere
(see [51], [52], [53], [54], [92]).
We have also used this collection of designs as data for theoretical investigations. Two
results are worth mentioning here.
(a) There is a simple lower bound on the average variance of an I-optimal design for quadratic
models with n measurements in a k-dimensional ball (see Eq. (13) below). A number of
interesting designs meet the bound. There is a similar bound for D-optimal designs.
(b) It is known that for large numbers of runs D- and G-optimal designs are equivalent [65].
The results of Section 5 show that I-optimal designs are strictly different. For example,
I-optimal designs make more measurements at the center of the region and fewer at the
boundary. For quadratic models in a k-dimensional ball, k large, an I-optimal design
makes about 4/k2 of the measurements at the center of the sphere, compared with
about 2/k2 for D- and G-optimal designs. I-optimality also appears to be a more strict
condition than D-optimality. In situations where the criteria produce similar designs
(such as certain linear designs, see Section 5.3), we commonly find that although I-
optimal designs are D-optimal, the converse is not necessarily true.
(ii) We have constructed designs for a number of industrial applications, for instance
problem involving continuous and discrete variables simultaneously, with linear constraints on
the variables (see Section 5.4).
We must emphasize that the theoretical justification for our designs (see Section 2) de-
pends strongly on the validity of the particular model being used. We are assuming that the
investigator has carried out some exploratory investigations (for example a screening design)
and has identified a region where it is plausible to describe the response by a low degree poly-
nomial. (Of course the design points when the initial measurements have been made can be
incorporated by our program in the next design — see Section 4.)
3
This work began in 1990 when a statistician in Seattle, David H. Doehlert, wrote to one
of us asking if we could construct designs for a full quadratic response-surface depending on k
variables in a sphere, where k is between 3 and 14, and in which the number of runs is minimal
or close to minimal. We had been using the pattern search algorithm in studying the Tammes
problem of placing M points on a k-dimensional sphere so they are well-separated [55], and
we found that a similar approach could be used for constructing designs.
2. Choice of optimality criterion
Extensive discussions of the relative merits of A-, D-, E-, G- and I-optimality and of the
dangers of relying on any single numerical criterion have appeared in the literature (see for
example [13], [15], [48], [64], [87]), and our treatment will be brief.
Suppose for concreteness that we wish to construct a design for a full quadratic response-
surface model
y = β0 +
k∑
i=1
βixi +
k∑
i=1
βiix
2
i +
k−1∑
i=1
k∑
j=i+1
βijxixj + ² , (1)
where there are k variables x1, . . . , xk, p = (k + 1)(k + 2)/2 unknown coefficients β, and the
errors ² are independent with mean 0 and variance σ2. Let the design consist of n ≥ p points
[xj1, . . . , xjk] for 1 ≤ j ≤ n , (2)
chosen from a certain region of measurement (or operability) O. Let X be the n× p expanded
design matrix, containing one row
f(x) = [1, x1, . . . , xk, x
2
1, . . . , x
2
k, x1x2, . . . , xk−1xk]
for each design point x = [x1, . . . , xk], and let
MX =
1
n
X ′X
denote the matrix of moments of the design measure (the prime indicates matrix transposition).
The prediction variance at an arbitrary point x is
var ŷ(x) =
σ2
n
f(x)M−1X f(x)
′ .
We define an I-optimal design (following Box and Draper [11], [12] and others) to be one
which minimizes the normalized average or integrated prediction variance
I =
n
σ2
∫
R
var ŷ(x)dµ(x) , (3)
4
where R is the region of interest (or modeling region), and µ is uniform measure on R with
total measure 1. This integral simplifies (cf. [12], p. 341) to give
I = trace{MM−1X } , (4)
where
M =
∫
R
f(x)′f(x)dµ(x) (5)
is the moment matrix of the region of interest.
In contrast, A-, D-, E- and G-optimal designs are those which minimize
A = trace M−1X , (6)
D = {det MX}−1/p , (7)
E = maximal eigenvalue of M−1X , (8)
G = maximal value of var f̂(x), x ∈ R (9)
respectively. We shall refer to the quantities defined in (4), (6)–(9) as the I-, A-, etc. values
of the design. As the number of runs n → ∞, these quantities approach limits I∞, A∞, etc.,
and the I-, A-, etc. efficiencies of the design are given by
I∞
I
,
A∞
A
,
D∞
D
,
E∞
E
,
G∞
G
(10)
(cf. [2], [70]).
We are most interested in minimizing the prediction variance, var f̂(x), for x ∈ R, which
suggests the use of I- or G-optimality. On the other hand our algorithm requires that the
criterion be differentiable, which holds for A-, D- and I- but not E- or G-optimality. Fur-
thermore we wish to be able to construct designs for which the measurement region O and
modeling region R are distinct, which also picks out the I- or G-criteria (since the A-, D- and
E-criteria do not involve R). We have therefore chosen I-optimality as our primary criterion.
However, the implementation of the algorithm allows the user to search for any of I-, A-, D-
or E-optimal designs.
It is worth mentioning that the I-value (Eq. (4)) is a dimensionless quantity. Also, provided
the model has the property that if it contains one monomial of degree d then it contains all
possible monomials of degree ≤ d, then the I-value is unchanged if the variables are rescaled
or if an orthogonal transformation is applied to the variables. In contrast the A- and E-values
5
of a design depend on the particular choice of coordinate axes used, a property which seems
unsatisfactory. In the present paper we shall concentrate on I- and D-efficiencies. For the
ball or cube it is not difficult to determine I∞ and D∞ exactly. In other cases we determined
them empirically, by estimating the limiting values of I and D as the number of runs increases.
We attempted to calculate I∞ and D∞ to at least three decimal places of accuracy. However,
inaccuracies in I∞ and D∞ are not too significant, because they do not change the relative
efficiencies of comparable designs. Note that a design is I- (or D-) optimal if it has the highest
I- (or D-) efficiency for a given number of runs, and so a design can be optimal without being
100% efficient.
To guard against the dangers of using any single number as a design criterion, we have
plotted variance dispersion graphs (Giovannitti-Jensen and Myers [48], Vining [96]) for our
designs; some examples will be found in [51]. These graphs show how the prediction variance
var f̂(x) varies over the region of interest, and also give the G-value of the design.
We conclude this section by briefly mentioning some earlier papers that look for I-optimal
designs.
Studden [95] gives a method for constructing approximations to I-optimal designs for 1-
dimensional problems.
Haines [49], Meyer and Nachtsheim [72], Crary [24] and Snow and Crary [93] all use sim-
ulated annealing to search for I-optimal designs. This approach seems to be restricted to low
dimensions and small numbers of design points, and furthermore is less successful at finding
optimal designs than our approach. For example, gosset finds slightly different designs in two
of the four cases shown in Figure 1 of [24]. Meyer and Nachtsheim [72] give several quite small
examples where simulated annealing was unable to find the best designs known. One of these,
a 17-run four-dimensional example, is described in Section 5.2 below.
Crary’s [24] program, I-OPT, like ours, allows the measurement and modeling regions to
be distinct. I-OPT has a feature that gosset does not have at present, namely the ability to
specify relative weights for points in the modeling region.
3. The algorithm
For a given model (such as (1)) involving k variables and p unknown coefficients, a design
consists of n points (2) (not necessarily distinct) in the region O. Let x be a vector specifying
the coordinates of all the design points. We wish to choose the design vector x so as to
6
minimize a certain differentiable function F (x). For I-optimal designs, F is the normalized
average prediction variance I = trace MM−1X , where M is the moment matrix of the region
R (see (4), (5)), while for A- or D-optimality F is trace M−1X or {det MX}−1/p (see (6), (7)).
We let g(x) denote the gradient ∇F (x), normalized to have length √n.
The regions O and R can be quite complicated. The present implementation (see Section 4)
permits O and R to be a product of cubes, balls and finite sets, possibly intersected by
hyperplanes and half-spaces. This includes simplices, of course, since these are intersections of
cubes and hyperplanes.
If there are no equality constraints in the problem then the design vector x is simply a
concatenation of the individual points (2) of the design. When equality constraints are present
we choose new, internal, variables to parametrize the space O, and then x is a concatenation
of the coordinates of the internal variables.
If the region O is a simple connected space such as a ball, our basic strategy is to minimize
F using a modified version of the Hooke and Jeeves [59] pattern search. An excellent description
is given by Beightler et al. [7], and our discussion here will be brief. We have modified the
algorithm to make use of the gradient of F . Roughly speaking, pattern search minimizes F
by accelerating down the gradient, and setting the velocity to zero whenever the function does
not improve. The velocity increases as long as improvements continue.
We begin with an initial design vector x = x(0), with velocity v(0) = 0. This initial design
may be obtained by choosing n random points from O, although in the implementation the
user has the option of specifying x(0) himself. We attempt to define the next vector by
x(i+1) = x(i) + v(i+1) , (11)
where the velocity vector v(i+1) is given by
v(i+1) = v(i) − sg(x(i)) , (12)
for i ≥ 0, where s is the current step size. If F (x(i+1)) < F (x(i)) the step is a success, we
accept this value for x(i+1), and repeat the iteration with s multiplied by 1.04. Otherwise
we set v(i) = 0 and try (11) and (12) again. If there is still no reduction in F , we divide
the step size by 2, and if it is larger than some small accuracy limit, try (11) and (12) again.
Otherwise we terminate this attempt, pick another starting design x(0), and repeat the whole
procedure. Any particular sequence F (x(0)), F (x(1)), F (x(2)), . . . (a decreasing, hence bounded,
7
sequence of positive numbers) is terminated either when the improvement is less than some
small accuracy limit, or (very rarely) when the number of steps exceeds a specified limit. After
a specified number of attempts the algorithm terminates, returning the best and the most
recent second-best designs found.
Equations (11) and (12) must be modified when the design vector x(n) involves points near
the boundary of O (either the natural boundary, for instance when O is a ball, cube or simplex,
or a boundary imposed by a constraint on the variables). When this happens, the components
of the gradient vector crossing the boundary are simply zeroed before normalization, and if
any point in x(n) nevertheless moves outside the boundary we move it to the closest point on
the boundary.
Incidentally, in the implementation the gradient is calculated from the exact formula for
∇F , rather than by numerical differentiation of F .
We have found this method very robust and efficient, typically minimizing a function of N
variables in N steps.
The above procedure is used when all or most of the variables in the design are continuous.
When many discrete variables are present we use a collection of heuristic algorithms (which
do however work most of the time).
Suppose for concreteness that there are k = 6 variables, the first four of which are contin-
uous and the last two are discrete variables. Let the n points of the design be
a = a1 · · · a6, b = b1 · · · b6, . . . , c = c1 · · · c6 ,
so that the design vector is x = ab · · · c.
We first discuss the question of choosing the discrete coordinates a5, a6, b5, b6, . . . , c5,
c6 of the initial design vector x
(0). For this we use one (or more) of three strategies: (i) run
sequentially through all possibilities in some fixed order, running through them all once before
repeating any; (ii) choose them at random from the set of all discrete possibilities; (iii) choose
them at random from the underlying continuous space. If constraints are present, in all three
cases only points satisfying the constraints are used. Strategies (i) and (ii) are reasonable
when the number of combinations of discrete variables is not much larger than the number of
design points, otherwise (iii) appears to be more successful. In the implementation the user
can specify the strategy to be used, or accept the default strategy (which depends on the type
of problem).
8
Second, in the optimization step, we optimize (using the method described earlier in this
section) either just the continuous components of x, keeping the discrete components fixed, or
all components, treating the discrete components as continuous.
If the latter strategy is adopted, a final step is needed to convert the discrete components
(a5, a6, etc.) back to their permitted discrete values. For this discretization step we again
use one of three methods, replacing each continuous component by: the closest discrete value,
whichever of the two closest discrete values gives the smallest value of F , or whichever of the
discrete values gives
本文档为【最优设计】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。