首页 LOCAL SPATIAL AUTOCORRELATION STATISTICS DISTRIBUTIONAL ISSUES AND AN APPLICATION

LOCAL SPATIAL AUTOCORRELATION STATISTICS DISTRIBUTIONAL ISSUES AND AN APPLICATION

LOCAL SPATIAL AUTOCORRELATION STATISTICS DISTRIBUTIONAL ISSUES AND AN APPLICATION J . K. Ord Arthur Getis Local Spatial Autocorrelation Statistics: Distributional Issues and an Application The statistics Gi ( d ) and G: ( d ) , introduced in Getis and Ord (1 992) for the study of local pattern in spatial data, are extended and th...

J . K. Ord Arthur Getis Local Spatial Autocorrelation Statistics: Distributional Issues and an Application The statistics Gi ( d ) and G: ( d ) , introduced in Getis and Ord (1 992) for the study of local pattern in spatial data, are extended and their properties further explored. In particular, nonbina y weights are allowed and the statistics are related to Moran’s autocorrelation statistic, I. The correlations between nearby values of the statistics are derived and verijied by simulation. A Bonferroni criterion is used to approximate signijcance levels when testing extreme values from the set of statistics. An example of the use of the statistics is given using spatial-temporal data on the AIDS epidemic centering on San Francisco. Re- sults indicate that in recent years the disease is intensvying in the counties sur- rounding the city. 1. INTRODUCTION In spatial data analysis, it is often necessary to determine whether or not identifiable spatial patterns exist. For example, we may test for spatial pattern by focusing on the locations of the sample points, by studying the values associ- ated with these locations given the sampling pattern, or by combining these analyses. There are many ways to test for the existence of such patterns; per- haps the most popular is Moran’s I statistic, which is used to test the null hy- pothesis that the spatial autocorrelation of a variable is zero. If the null hypoth- esis is rejected, the variable is said to be spatially autocorrelated. Traditional analyses such as nearest neighbor, k-function analysis, and the semivariogram are all widely used to study spatial patterns. All of these statistics are global in the sense that they require measurements from all or many geo-referenced points in the sample. This research was supported by the National Science Foundation, grant no. SES-9123832. The authors thank Cleridy Lennert of Scripps Institution of Oceanography and Serge Rey of San Diego State University for making programming suggestions and Xioake Zhang and Long Gen Ying of San Diego State University for gathering data and carrying out the simulations. They are grateful to Luc Anselin and the referees for their insightful comments which led to considerable improvements in the paper. J. IC Ord is the David H. McKinley Professor of Business Administration in the depart- ment of management science and information systems at The Pennsylvania State Univer- sity. Arthur Getis is Stephen and M a y Birch Professor of Geographical Studies in the department of geography at Sun Diego State University. Geographical Analysis, Vol. 27, No. 4 (October 1995) 0 Ohio State University Press 1. K Ord and Arthur Getis I 287 In recent years, there has been growing interest in local measures of spatial dependence. Much of this work has been inspired by the search for valid tests for the clustering of cases of rare diseases; see, for example Stone (1988), Cuzik and Edwards (1990), and Cressie (1992). Besag and Newell (1991) draw a use- ful distinction between general and focused tests. General tests are concerned with overall patterns in a large region, whereas focused tests “concentrate upon one or more smaller regions selected ostensibly because of some factor (for example, the location of a nuclear installation) that has been previously hy- pothesized to be associated with the disease.” Besag and Newell (1991) go on to discuss tests for the detection of clusters, whose purpose is to identify “hot spots” without any preconceptions about their locations. It is apparent that a test for hot spots could be used to serve the same role as a focused test, in that the hot spot should emerge from the pack if its local structure is sufficiently unusual. Furthermore, such an approach affords some protection against the biases that may arise when only selected areas are tested. Indeed, focused tests must rely upon either the availability of reference data for similar areas well removed from the putative source (Cuzik and Ed- wards 1990) or an adjustment to the distribution of the test statistic to compen- sate for the search for hot spots (cf. Stone 1988). In this regard, Besag and Newell (1991) point out the difficulties inherent in the original version of the Geographical Analysis Machine (GAM) introduced by Openshaw et al. (1987); they provide a modified analysis to overcome these difficulties. Besag and New- ell also point out that when region i has ni cases of a disease in a population of ti, the random permutations distribution for the Moran statistic may not be an appropriate frame of reference. This difficulty may arise when urban (high ti) areas tend to be clustered, and likewise rural (low t,) areas. Provided that the {nj} and (ti} are not too small, the difficulty may be resolved by using the pooled incidence estimator p = C nil C ti and then computing the standard scores for each region as The focus of this paper is a pair of tests for the detection of clusters, intro- duced by Getis and Ord (1992). These statistics are especially useful in cases where global statistics may fail to alert the researcher to significant pockets of clustering. For example, Getis and Ord showed that the distribution of Sudden Infant Death Syndrome in North Carolina for the period 1979-84 did not dis- play any global spatial pattern, but that a few counties in the southern part of the state displayed a clustering of cases. 2. STATEMENT OF THE PROBLEM Consider an area subdivided into n regions, i = I,&. . . , n, where each region is identified with a point whose Cartesian coordinates are known. Each i has associated with it a value xi that represents an observation upon the random variable Xi. T ically, it will be assumed that the Xi have identical marginal structure. Independence implies the absence of spatial autocorrelation, but the converse is not necessarily true. Nevertheless, tests for spatial autocorrelation are typically viewed as adequate assessments of dependence. Usually, if spatial autocorrelation exists, it will be exhibited by similarities be- tween contiguous regions, although negative patterns of dependence are also possible. The revised statistics considered in this paper may be used to search distributions; fyp urther, if the Xi are independent, we say that there is no spatial 288 / Geographical Analysis for either positive or negative dependence. Further, we focus upon physical dis- tances, but “distance” may be interpreted as travel time, conceptual distance, or any other measure that enables the n points to be located in a space of one or more dimensions. Getis and Ord (1992) introduced a family of statistics, G, that can be used as measures of spatial association in a number of circumstances. The local statis- tics, Gi and Gt, enable us to detect pockets of spatial association that may not be evident when using global statistics. In this paper, the statistics Gi and Gt are extended to include variables that do no have a natural origin. The cost of this move is that the statistics lose some intuitive appeal, but the benefit is that the earlier restriction no longer applies. In addition, the statistics may incorpo- rate nonbinary weight matrices. The new form increases the statistics’ flexibility, and, therefore, their usefulness. In section 3, we provide the results of a series of simulations designed to show the distributional and small sample properties of the statistics in different circumstances. We show how the statistics are related to Moran’s autocorrela- tion statistic, I, in section 4. In section 5, we address questions of edge effects and the correlation of G: values with one another. Section 6 contains an approx- imate procedure that allows us to test the most extreme of the observed Gi values, as a test for hot spots; section 7 examines the effect of global autocorre- lation upon these local tests. Finally, in section 8, we give an example of the use of the Gi statistics with regard to the spatial analysis of the location of the those suffering from the AIDS disease in San Francisco and neighboring areas for the period 1989-1993. 1. 3. THE REWRITTEN STATISTICS In Getis and Ord (1992), the statistic Gi(d) is defined as where [wij(d)} is a symmetric one/zero spatial weight matrix with ones for all links defined as being within distance d of a given i; all other links are zero including the link of point i to itself. Throughout the paper, the d argument is dropped when only a single distance is under consideration. The sum of the weights is writ- ten as The numerator of (2) is the sum of all x j within d of i but not including xi. The denominator is the sum of all x j not including x i . When we set it may be shown that J. K Ord and Arthur Getis / 289 It should be noted that Getis and Ord (1992) used (K1, &) in place of [z(i), s2(i)]. It was shown (Getis and Ord 1992) that if E(Gi) is bounded away from 0 and from 1, then the permutations distribution of Gi under H, approaches normality. We now redefine Gi as a standard variate by taking the statistic minus its expectation, E[GJ = Wi/(n - I), divided by the square root of its variance; at the same time we allow the weights to be nonbinary. The resulting measures are Ejwij(d)xj - WiZ(i) Gi(d) = j # i. s ( i ) { [ ( n - 1)Sli) - w3/ (n - 2)}4’ Similarly, if we include wii # 0, the standardized Gf statistic is In (6) and (7), we have W: = Wi + wii, 5’1, = C j w$ ( j # i), and Syi = C j w;j (all j ) ; 5 and s2 denote the usual sample mean and variance. Numerical results for (6) and (7) are given in Table 1. As expected, the follow- ing patterns emerge for the distributions of Gf and similar results hold for G,: i. When the underlying distribution is normal, so is that of the test statistics (an exact result); ii. when the underlying distribution is markedly skew, the distribution of the test statistics is non-normal, but approaches normality as the distance is increased. iii. the statistics for edge cells approach normality more slowly because they have fewer neighbors; the convergence for comer cells is still slower. 4. PROPERTIES AND ASSOCIATIONS WITH 1 Moran’s statistic can be written as Ci(x i - 2) Cj w i j ( ~ j - 3) n -- Ws2 W ’ I(d) = temporarily dropping the d argument in the weights, for convenience; W = xi W:. If we set where .iiri = W:/n and K2i = [nS;, - Wz2]/(n - 1) and put convenience, we have = Kzi, again for Cj W i j ( Z j - z) G; Gf(d) = s K ~ and where zi = (xi - Z)/s, so that I (d) is a weighted average of the local statistics. T A B L E 1 M ea n, S ta nd ar d D ev ia tio n, S ke w ne ss , a nd K ur to sis o f G ;( d) St at is tic f or F iv e T ho us an d R an do m P er m ut at io ns o f E ac h of F ou r Pr ob ab ili ty D is tr ib ut io ns t fo r Fi ve D is ta nc es b y T yp e of C el l in a 1 0 by 1 0 M at rix C en tr al C el l E dg e C el l C om er C el l D is t (d ) M ea n SD Sk ew K ur M ea n SD sk ew K ur M ea n SD sk ew K ur N or m al 1. 0 -.0 05 .9 98 1. 5 .0 02 1. 00 4 2. 0 -.0 05 1. 01 0 2. 5 -.O OO 1. 00 8 3. 0 .0 09 .9 89 B in ar y 1. 0 -. OW 1. 00 8 1. 5 -.0 05 1. 01 0 2. 0 .0 02 ,9 90 2. 5 -.0 01 1. 01 2 3. 0 -.0 05 1. 01 7 Po is so n 1. 0 .0 10 1. 00 6 1. 5 .0 07 .9 97 2. 0 -.0 08 ,9 94 2. 5 .0 11 .9 98 3. 0 -.0 02 .9 92 E xp on en tia l 1. 0 .0 01 1. 00 4 1. 5 .0 03 1. 01 2 2. 0 ,0 02 1. 00 8 2. 5 .0 17 1. 00 4 3. 0 -.0 07 ,9 84 I C ha ra ct er ist ic s of th e D is tr ib ut io ns : N or m al 0. 07 5 B in ar y 0. 50 0 E xp on en tia l 0. 82 9 Po is so n 0. 97 0 M ea n .O M -.0 21 -.0 45 -.0 47 -.0 03 -.0 31 .0 65 ,0 14 ,0 40 '3 73 ,2 07 ,1 74 ,1 73 .0 82 1. 22 6 .9 34 ,7 26 .4 37 .2 84 -.0 33 -.2 17 .0 04 .0 36 -.0 12 -.0 53 -.0 37 -.0 79 ,0 01 -.0 46 .0 05 -.4 34 -.0 09 -.2 40 -.0 17 -.1 52 -.0 04 ,0 62 -.0 05 -.0 02 .0 07 -. lo 1 .0 16 -.0 94 .0 03 .0 17 .0 05 -.0 97 .0 07 .0 02 .0 01 1. 57 9 .0 04 .8 80 .0 06 .4 29 ,0 01 ,0 01 .0 18 -.2 20 -.0 03 St . D ev ia tio n 1. 04 4 0. 50 3 1. 00 0 0. 97 9 1. 01 4 1. 00 3 ,9 96 .9 91 1. 00 2 1. 00 4 .9 96 .9 91 1. 01 2 1. 00 3 1. 01 3 ,9 95 1. 00 5 1. 00 2 ,9 96 .9 92 1. 01 8 1. 00 1 .9 96 ,9 98 .0 23 .0 53 -.0 47 -.0 25 -.0 05 .0 16 -.0 27 -.0 08 .0 26 ,0 08 .4 01 ,3 58 ,2 47 .2 33 ,0 97 1. 38 3 1. 12 6 .8 67 ,6 51 .4 80 ,0 49 -.0 01 -.0 19 -.0 06 -.1 13 ,0 05 ,0 10 -.0 03 -.0 52 .0 04 -.4 82 -.0 04 -.2 59 -.0 01 - , 18 8 ,0 10 -.0 75 -.0 05 -.0 16 .0 01 -.0 54 -.0 07 .0 64 -.0 15 - ,0 24 ,0 02 -.1 23 -. W -.0 12 .0 03 2. 21 5 .0 03 1. 44 2 ,0 09 .8 14 -.0 01 ,2 07 .0 21 .0 01 .0 03 M in im um V dw -3 .0 75 3 0. 00 00 0. 00 65 o. oo 00 .9 90 1. 01 1 1. 00 1 ,9 91 .9 79 ,9 92 .9 86 ,9 89 1. 00 6 .9 92 .9 89 .9 99 ,9 94 1. 00 6 1. 00 3 ,9 95 .9 93 .9 94 1. 00 4 1. 00 0 .0 04 .0 14 -.0 36 -.0 60 .0 07 .0 03 .0 20 -.0 02 -.0 12 .0 38 .5 50 .4 31 ,3 22 .2 92 .1 92 1. 64 8 1. 35 7 1. 10 8 .9 12 .7 84 .0 58 -.0 86 -.0 41 -.1 27 -.M 7 -.6 21 -.4 46 -.2 21 -.2 98 -.0 91 .3 40 .0 40 -.0 30 -.0 06 .0 25 3. 13 8 2. 02 7 1. 36 2 .8 02 .5 20 M ax im um Va lu e 2. 61 86 L oo 00 6. 33 43 4. 00 00 ~ ~ ~~ ~~ ~ (I ) I h e sk ew ne ss m ea su re is th e st an da rd iz ed th ird m om en t, sk ew = m $/ d " , w he re as th e ku rt os is m ea su re is k ur = ( m r/ $) - 3, w he re m , = C (z - z) '/n . T he p op ul at io n va lu es o f sk ew a nd k ur a re z er o fo r th e no rm al . (2 ) F or s am pl es of si ze 5 ,0 00 , t he n ul l h yp ot he si s o f n or ma li ty is r ej ec te d at th e a = 0 .0 5 le ve l i f J sk ew J > 0. 06 8. o r if k ur > 0. 14 or < -0 .1 3. 1. K Ord and Arthur Getis / 291 Further, when the permutations approach is used so that (5, s) are fixed, the standardized skewness and kurtosis measures (cf. Stuart and Ord 1987, p. 107) reduce to .l(Gt) = (K3i/K3 P3 (12) 72(G;) = (K4i/K:) ( ~ 4 - 3) (13) where npT = xi z:, so that pr represents the moments of the original set of n observations. For example, suppose location i has m neighbors at distance d or less, and that binary weights are used. It follows from equations (12) and (13) that and Generally n will be large relative to m, since the Gf are looking at local pat- terns, so we have, approximately, corresponding to the usual rates of convergence with the Central Limit Theorem. Thus, rovided d is not too small and the weights are not too uneven, approximate EXAMPLE 4.1. Suppose a variable X is spatially distributed as in Figure 1. The numbers in parentheses are the identifying numbers for the observations. Sup- pose our interest is in the possible clustering of high values in the vicinity of point 5 but not including point 5 itself. We decide to select increments of 10 meters from point 5 to a distance of 30 meters (Figure 2). First, we use equation (4) to find %(5) and s2(5). These are 0.0986 and 1.4336, respectively. Then, select the Gi(d) statistic since we are excluding point i. In this example, the weights are binary, that is, wij = 1 if point j is within d of point i and zero otherwise. For example, when d = 10, ~ 5 1 = 0 since the distance between point 5 and point 1 is greater than 10. Using equa- tion (6) we get norm a: *ty is a reasonable assumption. = 1.3125; (1.67) - (1) (.0986) G5(10) = {[(1)(8 - 1 - 1)(1.4336)]/(8 - 2)}1'2 G5(20) = 2.1562; Gs(30) = 1.7692. From these results, it is clear that the clustering of positive values around point 5 reaches a maximum in the neighborhood of twenty meters. 292 / Geographical Analysis (2) -1.62 (3) -.05 (8) 0 - 1.03 0 10 20 30 Miles FIG. I 0 10 20 30 Miles 1 FIG. 2 EXAMPLE 4.2. Now we focus on possible clustering around point 5 including point 5 itself. For this we use the G:(d) statistic. 3 is 0.3575 and s2 is 1.7237. G;(10) = 1.8179; Gt(20) = 2.4078; J. K Ord and Arthur Getis / 293 = 1.9629. (2.17 + 1.67 + 1.86 - .35 + .21) - (5)(.3575) Gi(30) = {[(5)(8 - 5)(1.7237)]/(8 - 1)}1’2 In this example, the clustering of positive values is much more in evidence than in the previous example simply because the value at point 5 is included in the calcu- lations and point 5 happens to be associated with a large positive score. EXAMPLE 4.3. Now suppose that instead of a binary weighting scheme, we to 1 and each result as in Example 4.1. EXAMPLE 4.4. In this nonbinary example, we weight each observation by w$ = (l/dbj)/W?, where W: = C w$ and Wi = C w;/W. = 1, so that points close to point 5 are given more weigh than far points. In this case, we seek only one value of G,. Values of d for each of the j points are (1) 23, (2) 44, (3) 37, (4) 13, (6) 7, (7) 28, (8) 52. Again, using equation (6) we have G5(1/dsj) = 1.9893. This procedure cannot be used for Gf simply because w z = 00; however, modifications to the weights such as l/(a + d + j ) , a > 0 could clearly be used. the sum of the weights within d of i sum is l/W,. In this case Gi and Gt are homogene- invariant. Thus G5(30) = 1.7692, the same 5. CORRELATION STRUCTURE AND EXPERIMENTAL RESULTS Clearly the Gi, Gf values for various i locations on the same map are not independent, especially if the i locations are within distance d of one another. In the remainder of this section we focus upon Gf since the analysis is more strai htforward. Similar results hold for Gi. For a particular i, say A, G; is de- cell, B, is within d of A, then the Gf value associated with B is dependent on a number of the same values on which A is dependent. In Figure 3, the cells within distance 2.0 (where, as before, a distance of 1.0 is measured from the center of a cell to the center of a contiguous cell) of A are denoted with an a and the cells within 2.0 of B are denoted as b. Of the thirteen fine I f in terms of the association of A to all locations j within d of A. If another FIG. 3 294 / Geographical Analysis cells within 2.0 of A, five of them overlap with the cells within 2.0 of B. It is clear, then, that the Gf(d = 2) values of A and B correlate to some degree. The degree of correlation not only depends on the overlap, but also the num- ber of regions; see, for example, equation (18) below. In nonlattice situations, the configuration of the units will also bear on the degree of correlation. When {wij} are binary, the correlations for the G; statistic are found to be with the expression for nonbinary weights being different only in that the denom- inator changes: Consider a regular lattice, as shown in Figure 3; if we restrict attention to interior cells, such that both i and k have m neighbors, we can write Then

                    本文档为【LOCAL SPATIAL AUTOCORRELATION STATISTICS DISTRIBUTIONAL ISSUES AND AN APPLICATION】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，
                    图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。 
 该文档来自用户分享，如有侵权行为请发邮件ishare@vip.sina.com联系网站客服，我们会及时删除。

                    [版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。

                    本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。

                    网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。
                

下载需要：免费已有0 人下载

立即下载

LOCAL SPATIAL AUTOCORRELATION STATISTICS DISTRIBUTIONAL ISSUES AND AN APPLICATION

你可能还喜欢