J . K. Ord
Arthur Getis
Local Spatial Autocorrelation Statistics:
Distributional Issues and an Application
The statistics Gi ( d ) and G: ( d ) , introduced in Getis and Ord (1 992) for the study
of local pattern in spatial data, are extended and their properties further
explored. In particular, nonbina y weights are allowed and the statistics are
related to Moran’s autocorrelation statistic, I. The correlations between nearby
values of the statistics are derived and verijied by simulation. A Bonferroni
criterion is used to approximate signijcance levels when testing extreme values
from the set of statistics. An example of the use of the statistics is given using
spatial-temporal data on the AIDS epidemic centering on San Francisco. Re-
sults indicate that in recent years the disease is intensvying in the counties sur-
rounding the city.
1. INTRODUCTION
In spatial data analysis, it is often necessary to determine whether or not
identifiable spatial patterns exist. For example, we may test for spatial pattern
by focusing on the locations of the sample points, by studying the values associ-
ated with these locations given the sampling pattern, or by combining these
analyses. There are many ways to test for the existence of such patterns; per-
haps the most popular is Moran’s I statistic, which is used to test the null hy-
pothesis that the spatial autocorrelation of a variable is zero. If the null hypoth-
esis is rejected, the variable is said to be spatially autocorrelated. Traditional
analyses such as nearest neighbor, k-function analysis, and the semivariogram
are all widely used to study spatial patterns. All of these statistics are global in
the sense that they require measurements from all or many geo-referenced
points in the sample.
This research was supported by the National Science Foundation, grant no. SES-9123832. The
authors thank Cleridy Lennert of Scripps Institution of Oceanography and Serge Rey of San Diego
State University for making programming suggestions and Xioake Zhang and Long Gen Ying of San
Diego State University for gathering data and carrying out the simulations. They are grateful to Luc
Anselin and the referees for their insightful comments which led to considerable improvements in
the paper.
J. IC Ord is the David H. McKinley Professor of Business Administration in the depart-
ment of management science and information systems at The Pennsylvania State Univer-
sity. Arthur Getis is Stephen and M a y Birch Professor of Geographical Studies in the
department of geography at Sun Diego State University.
Geographical Analysis, Vol. 27, No. 4 (October 1995) 0 Ohio State University Press
1. K Ord and Arthur Getis I 287
In recent years, there has been growing interest in local measures of spatial
dependence. Much of this work has been inspired by the search for valid tests
for the clustering of cases of rare diseases; see, for example Stone (1988), Cuzik
and Edwards (1990), and Cressie (1992). Besag and Newell (1991) draw a use-
ful distinction between general and focused tests. General tests are concerned
with overall patterns in a large region, whereas focused tests “concentrate
upon one or more smaller regions selected ostensibly because of some factor
(for example, the location of a nuclear installation) that has been previously hy-
pothesized to be associated with the disease.” Besag and Newell (1991) go on to
discuss tests for the detection of clusters, whose purpose is to identify “hot
spots” without any preconceptions about their locations.
It is apparent that a test for hot spots could be used to serve the same role as
a focused test, in that the hot spot should emerge from the pack if its local
structure is sufficiently unusual. Furthermore, such an approach affords some
protection against the biases that may arise when only selected areas are
tested. Indeed, focused tests must rely upon either the availability of reference
data for similar areas well removed from the putative source (Cuzik and Ed-
wards 1990) or an adjustment to the distribution of the test statistic to compen-
sate for the search for hot spots (cf. Stone 1988). In this regard, Besag and
Newell (1991) point out the difficulties inherent in the original version of the
Geographical Analysis Machine (GAM) introduced by Openshaw et al. (1987);
they provide a modified analysis to overcome these difficulties. Besag and New-
ell also point out that when region i has ni cases of a disease in a population of
ti, the random permutations distribution for the Moran statistic may not be an
appropriate frame of reference. This difficulty may arise when urban (high ti)
areas tend to be clustered, and likewise rural (low t,) areas. Provided that the
{nj} and (ti} are not too small, the difficulty may be resolved by using the
pooled incidence estimator p = C nil C ti and then computing the standard
scores for each region as
The focus of this paper is a pair of tests for the detection of clusters, intro-
duced by Getis and Ord (1992). These statistics are especially useful in cases
where global statistics may fail to alert the researcher to significant pockets of
clustering. For example, Getis and Ord showed that the distribution of Sudden
Infant Death Syndrome in North Carolina for the period 1979-84 did not dis-
play any global spatial pattern, but that a few counties in the southern part of
the state displayed a clustering of cases.
2. STATEMENT OF THE PROBLEM
Consider an area subdivided into n regions, i = I,&. . . , n, where each region
is identified with a point whose Cartesian coordinates are known. Each i has
associated with it a value xi that represents an observation upon the random
variable Xi. T ically, it will be assumed that the Xi have identical marginal
structure. Independence implies the absence of spatial autocorrelation, but the
converse is not necessarily true. Nevertheless, tests for spatial autocorrelation
are typically viewed as adequate assessments of dependence.
Usually, if spatial autocorrelation exists, it will be exhibited by similarities be-
tween contiguous regions, although negative patterns of dependence are also
possible. The revised statistics considered in this paper may be used to search
distributions; fyp urther, if the Xi are independent, we say that there is no spatial
288 / Geographical Analysis
for either positive or negative dependence. Further, we focus upon physical dis-
tances, but “distance” may be interpreted as travel time, conceptual distance,
or any other measure that enables the n points to be located in a space of one
or more dimensions.
Getis and Ord (1992) introduced a family of statistics, G, that can be used as
measures of spatial association in a number of circumstances. The local statis-
tics, Gi and Gt, enable us to detect pockets of spatial association that may not
be evident when using global statistics. In this paper, the statistics Gi and Gt
are extended to include variables that do no have a natural origin. The cost of
this move is that the statistics lose some intuitive appeal, but the benefit is that
the earlier restriction no longer applies. In addition, the statistics may incorpo-
rate nonbinary weight matrices. The new form increases the statistics’ flexibility,
and, therefore, their usefulness.
In section 3, we provide the results of a series of simulations designed to
show the distributional and small sample properties of the statistics in different
circumstances. We show how the statistics are related to Moran’s autocorrela-
tion statistic, I, in section 4. In section 5, we address questions of edge effects
and the correlation of G: values with one another. Section 6 contains an approx-
imate procedure that allows us to test the most extreme of the observed Gi
values, as a test for hot spots; section 7 examines the effect of global autocorre-
lation upon these local tests. Finally, in section 8, we give an example of the
use of the Gi statistics with regard to the spatial analysis of the location of the
those suffering from the AIDS disease in San Francisco and neighboring areas
for the period 1989-1993.
1.
3. THE REWRITTEN STATISTICS
In Getis and Ord (1992), the statistic Gi(d) is defined as
where [wij(d)} is a symmetric one/zero spatial weight matrix with ones for all links
defined as being within distance d of a given i; all other links are zero including
the link of point i to itself. Throughout the paper, the d argument is dropped
when only a single distance is under consideration. The sum of the weights is writ-
ten as
The numerator of (2) is the sum of all x j within d of i but not including xi. The
denominator is the sum of all x j not including x i . When we set
it may be shown that
J. K Ord and Arthur Getis / 289
It should be noted that Getis and Ord (1992) used (K1, &) in place of
[z(i), s2(i)]. It was shown (Getis and Ord 1992) that if E(Gi) is bounded away
from 0 and from 1, then the permutations distribution of Gi under H, approaches
normality. We now redefine Gi as a standard variate by taking the statistic minus
its expectation, E[GJ = Wi/(n - I), divided by the square root of its variance; at
the same time we allow the weights to be nonbinary. The resulting measures are
Ejwij(d)xj - WiZ(i)
Gi(d) = j # i.
s ( i ) { [ ( n - 1)Sli) - w3/ (n - 2)}4’
Similarly, if we include wii # 0, the standardized Gf statistic is
In (6) and (7), we have W: = Wi + wii, 5’1, = C j w$ ( j # i), and Syi = C j w;j
(all j ) ; 5 and s2 denote the usual sample mean and variance.
Numerical results for (6) and (7) are given in Table 1. As expected, the follow-
ing patterns emerge for the distributions of Gf and similar results hold for G,:
i. When the underlying distribution is normal, so is that of the test statistics
(an exact result);
ii. when the underlying distribution is markedly skew, the distribution of the test
statistics is non-normal, but approaches normality as the distance is increased.
iii. the statistics for edge cells approach normality more slowly because they
have fewer neighbors; the convergence for comer cells is still slower.
4. PROPERTIES AND ASSOCIATIONS WITH 1
Moran’s statistic can be written as
Ci(x i - 2) Cj w i j ( ~ j - 3) n --
Ws2 W ’
I(d) =
temporarily dropping the d argument in the weights, for convenience; W = xi W:. If we set
where .iiri = W:/n and K2i = [nS;, - Wz2]/(n - 1) and put
convenience, we have
= Kzi, again for
Cj W i j ( Z j - z)
G; Gf(d) =
s K ~
and
where zi = (xi - Z)/s, so that I (d) is a weighted average of the local statistics.
T
A
B
L
E
1
M
ea
n,
S
ta
nd
ar
d
D
ev
ia
tio
n,
S
ke
w
ne
ss
, a
nd
K
ur
to
sis
o
f
G
;(
d)
St
at
is
tic
f
or
F
iv
e
T
ho
us
an
d
R
an
do
m
P
er
m
ut
at
io
ns
o
f
E
ac
h
of
F
ou
r
Pr
ob
ab
ili
ty
D
is
tr
ib
ut
io
ns
t
fo
r
Fi
ve
D
is
ta
nc
es
b
y
T
yp
e
of
C
el
l
in
a
1
0
by
1
0
M
at
rix
C
en
tr
al
C
el
l
E
dg
e
C
el
l
C
om
er
C
el
l
D
is
t
(d
)
M
ea
n
SD
Sk
ew
K
ur
M
ea
n
SD
sk
ew
K
ur
M
ea
n
SD
sk
ew
K
ur
N
or
m
al
1.
0
-.0
05
.9
98
1.
5
.0
02
1.
00
4
2.
0
-.0
05
1.
01
0
2.
5
-.O
OO
1.
00
8
3.
0
.0
09
.9
89
B
in
ar
y
1.
0
-.
OW
1.
00
8
1.
5
-.0
05
1.
01
0
2.
0
.0
02
,9
90
2.
5
-.0
01
1.
01
2
3.
0
-.0
05
1.
01
7
Po
is
so
n
1.
0
.0
10
1.
00
6
1.
5
.0
07
.9
97
2.
0
-.0
08
,9
94
2.
5
.0
11
.9
98
3.
0
-.0
02
.9
92
E
xp
on
en
tia
l
1.
0
.0
01
1.
00
4
1.
5
.0
03
1.
01
2
2.
0
,0
02
1.
00
8
2.
5
.0
17
1.
00
4
3.
0
-.0
07
,9
84
I C
ha
ra
ct
er
ist
ic
s
of
th
e
D
is
tr
ib
ut
io
ns
:
N
or
m
al
0.
07
5
B
in
ar
y
0.
50
0
E
xp
on
en
tia
l
0.
82
9
Po
is
so
n
0.
97
0
M
ea
n
.O
M
-.0
21
-.0
45
-.0
47
-.0
03
-.0
31
.0
65
,0
14
,0
40
'3
73
,2
07
,1
74
,1
73
.0
82
1.
22
6
.9
34
,7
26
.4
37
.2
84
-.0
33
-.2
17
.0
04
.0
36
-.0
12
-.0
53
-.0
37
-.0
79
,0
01
-.0
46
.0
05
-.4
34
-.0
09
-.2
40
-.0
17
-.1
52
-.0
04
,0
62
-.0
05
-.0
02
.0
07
-.
lo
1
.0
16
-.0
94
.0
03
.0
17
.0
05
-.0
97
.0
07
.0
02
.0
01
1.
57
9
.0
04
.8
80
.0
06
.4
29
,0
01
,0
01
.0
18
-.2
20
-.0
03
St
. D
ev
ia
tio
n
1.
04
4
0.
50
3
1.
00
0
0.
97
9
1.
01
4
1.
00
3
,9
96
.9
91
1.
00
2
1.
00
4
.9
96
.9
91
1.
01
2
1.
00
3
1.
01
3
,9
95
1.
00
5
1.
00
2
,9
96
.9
92
1.
01
8
1.
00
1
.9
96
,9
98
.0
23
.0
53
-.0
47
-.0
25
-.0
05
.0
16
-.0
27
-.0
08
.0
26
,0
08
.4
01
,3
58
,2
47
.2
33
,0
97
1.
38
3
1.
12
6
.8
67
,6
51
.4
80
,0
49
-.0
01
-.0
19
-.0
06
-.1
13
,0
05
,0
10
-.0
03
-.0
52
.0
04
-.4
82
-.0
04
-.2
59
-.0
01
- ,
18
8
,0
10
-.0
75
-.0
05
-.0
16
.0
01
-.0
54
-.0
07
.0
64
-.0
15
-
,0
24
,0
02
-.1
23
-.
W
-.0
12
.0
03
2.
21
5
.0
03
1.
44
2
,0
09
.8
14
-.0
01
,2
07
.0
21
.0
01
.0
03
M
in
im
um
V
dw
-3
.0
75
3
0.
00
00
0.
00
65
o.
oo
00
.9
90
1.
01
1
1.
00
1
,9
91
.9
79
,9
92
.9
86
,9
89
1.
00
6
.9
92
.9
89
.9
99
,9
94
1.
00
6
1.
00
3
,9
95
.9
93
.9
94
1.
00
4
1.
00
0
.0
04
.0
14
-.0
36
-.0
60
.0
07
.0
03
.0
20
-.0
02
-.0
12
.0
38
.5
50
.4
31
,3
22
.2
92
.1
92
1.
64
8
1.
35
7
1.
10
8
.9
12
.7
84
.0
58
-.0
86
-.0
41
-.1
27
-.M
7
-.6
21
-.4
46
-.2
21
-.2
98
-.0
91
.3
40
.0
40
-.0
30
-.0
06
.0
25
3.
13
8
2.
02
7
1.
36
2
.8
02
.5
20
M
ax
im
um
Va
lu
e
2.
61
86
L
oo
00
6.
33
43
4.
00
00
~
~
~~
~~
~
(I
) I
h
e
sk
ew
ne
ss
m
ea
su
re
is
th
e
st
an
da
rd
iz
ed
th
ird
m
om
en
t,
sk
ew
=
m
$/
d
"
,
w
he
re
as
th
e
ku
rt
os
is
m
ea
su
re
is
k
ur
=
(
m
r/
$)
-
3,
w
he
re
m
,
=
C
(z
-
z)
'/n
.
T
he
p
op
ul
at
io
n
va
lu
es
o
f
sk
ew
a
nd
k
ur
a
re
z
er
o
fo
r
th
e
no
rm
al
.
(2
) F
or
s
am
pl
es
of
si
ze
5
,0
00
, t
he
n
ul
l h
yp
ot
he
si
s o
f n
or
ma
li
ty
is
r
ej
ec
te
d
at
th
e
a
=
0
.0
5
le
ve
l i
f J
sk
ew
J >
0.
06
8.
o
r
if
k
ur
>
0.
14
or
<
-0
.1
3.
1. K Ord and Arthur Getis / 291
Further, when the permutations approach is used so that (5, s) are fixed, the
standardized skewness and kurtosis measures (cf. Stuart and Ord 1987, p. 107)
reduce to
.l(Gt) = (K3i/K3 P3 (12)
72(G;) = (K4i/K:) ( ~ 4 - 3) (13)
where npT = xi z:, so that pr represents the moments of the original set of n
observations.
For example, suppose location i has m neighbors at distance d or less, and
that binary weights are used. It follows from equations (12) and (13) that
and
Generally n will be large relative to m, since the Gf are looking at local pat-
terns, so we have, approximately,
corresponding to the usual rates of convergence with the Central Limit Theorem.
Thus, rovided d is not too small and the weights are not too uneven, approximate
EXAMPLE 4.1. Suppose a variable X is spatially distributed as in Figure 1. The
numbers in parentheses are the identifying numbers for the observations. Sup-
pose our interest is in the possible clustering of high values in the vicinity of
point 5 but not including point 5 itself. We decide to select increments of 10
meters from point 5 to a distance of 30 meters (Figure 2).
First, we use equation (4) to find %(5) and s2(5). These are 0.0986 and
1.4336, respectively. Then, select the Gi(d) statistic since we are excluding
point i. In this example, the weights are binary, that is, wij = 1 if point j is
within d of point i and zero otherwise. For example, when d = 10, ~ 5 1 = 0
since the distance between point 5 and point 1 is greater than 10. Using equa-
tion (6) we get
norm a: *ty is a reasonable assumption.
= 1.3125; (1.67) - (1) (.0986) G5(10) =
{[(1)(8 - 1 - 1)(1.4336)]/(8 - 2)}1'2
G5(20) = 2.1562;
Gs(30) = 1.7692.
From these results, it is clear that the clustering of positive values around point 5
reaches a maximum in the neighborhood of twenty meters.
292 / Geographical Analysis
(2)
-1.62
(3)
-.05 (8)
0 - 1.03
0 10 20 30
Miles
FIG. I
0 10 20 30
Miles
1
FIG. 2
EXAMPLE 4.2. Now we focus on possible clustering around point 5 including
point 5 itself. For this we use the G:(d) statistic. 3 is 0.3575 and s2 is 1.7237.
G;(10) = 1.8179;
Gt(20) = 2.4078;
J. K Ord and Arthur Getis / 293
= 1.9629.
(2.17 + 1.67 + 1.86 - .35 + .21) - (5)(.3575)
Gi(30) =
{[(5)(8 - 5)(1.7237)]/(8 - 1)}1’2
In this example, the clustering of positive values is much more in evidence than in
the previous example simply because the value at point 5 is included in the calcu-
lations and point 5 happens to be associated with a large positive score.
EXAMPLE 4.3. Now suppose that instead of a binary weighting scheme, we
to 1 and each
result as in Example 4.1.
EXAMPLE 4.4. In this nonbinary example, we weight each observation by
w$ = (l/dbj)/W?, where W: = C w$ and Wi = C w;/W. = 1, so that points
close to point 5 are given more weigh than far points. In this case, we seek
only one value of G,. Values of d for each of the j points are (1) 23, (2) 44, (3)
37, (4) 13, (6) 7, (7) 28, (8) 52. Again, using equation (6) we have G5(1/dsj) =
1.9893. This procedure cannot be used for Gf simply because w z = 00; however,
modifications to the weights such as l/(a + d + j ) , a > 0 could clearly be used.
the sum of the weights within d of i sum
is l/W,. In this case Gi and Gt are homogene-
invariant. Thus G5(30) = 1.7692, the same
5. CORRELATION STRUCTURE AND EXPERIMENTAL RESULTS
Clearly the Gi, Gf values for various i locations on the same map are not
independent, especially if the i locations are within distance d of one another.
In the remainder of this section we focus upon Gf since the analysis is more
strai htforward. Similar results hold for Gi. For a particular i, say A, G; is de-
cell, B, is within d of A, then the Gf value associated with B is dependent on a
number of the same values on which A is dependent.
In Figure 3, the cells within distance 2.0 (where, as before, a distance of 1.0
is measured from the center of a cell to the center of a contiguous cell) of A are
denoted with an a and the cells within 2.0 of B are denoted as b. Of the thirteen
fine I f in terms of the association of A to all locations j within d of A. If another
FIG. 3
294 / Geographical Analysis
cells within 2.0 of A, five of them overlap with the cells within 2.0 of B. It is
clear, then, that the Gf(d = 2) values of A and B correlate to some degree.
The degree of correlation not only depends on the overlap, but also the num-
ber of regions; see, for example, equation (18) below. In nonlattice situations,
the configuration of the units will also bear on the degree of correlation.
When {wij} are binary, the correlations for the G; statistic are found to be
with the expression for nonbinary weights being different only in that the denom-
inator changes:
Consider a regular lattice, as shown in Figure 3; if we restrict attention to
interior cells, such that both i and k have m neighbors, we can write
Then
本文档为【LOCAL SPATIAL AUTOCORRELATION STATISTICS DISTRIBUTIONAL ISSUES AND AN APPLICATION】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。