Evaluation of hotspot cluster detection using spatial scan statistic based on exact counting
 284 Downloads
Abstract
In this paper, we propose a novel approach to the detection of spatial clusters based on linkage information of a map dataset. Spatial scan statistic has been widely used for detecting a hotspot cluster (or a coldspot cluster) in various fields, such as astronomy, biosurveillance, natural disasters, and forestry. This approach is based on the idea of finding a connected regional subset that maximizes likelihood in the whole study area. To detect a hotspot cluster, which aggregates highrisk regions so as to be maximum likelihood, we only just search such a cluster from all patterns of connected regional subsets. However, except when there are extremely few regions of the study area, since the total number of connected regional patterns usually becomes enormous, we cannot investigate all of them. This means that we have not been able to know whether a detected hotspot which is obtained under certain rules, such as using the previous studies, has the truly maximum likelihood within a given study area. A zerosuppressed binary decision diagram (ZDD), one approach to frequent item set mining, enables us to extract all of the potential cluster regions at a realistic computational load. In this study, we propose a hotspot detection method using ZDDbased enumeration, and apply it to sudden infant death syndrome in North Carolina. This completely new method enables us to detect a true hotspot cluster that has the truly maximum likelihood. To evaluate our proposed method, we compare the properties of that with existing methods such as flexible scan and echelon scan, and discuss their suitability for different purposes of detecting hotspot.
Keywords
Spatial cluster detection Spatial scan statistic Echelon analysis Zerosuppressed binary decision diagram1 Introduction
Detecting where a problem occurs, such as the generation status of infective diseases or hazard maps of natural disasters, is very basic and important to elucidate the causes of the problem and to take measure against environmental preservation or safety management. Currently, it is becoming easier to analyze the various types of spatial data and express them visually on a map, coupled with a dramatic advance in geographical informationsystems due to the growing sophistication of hardware or network technologies. For example, a statistical map with shading, such as a choropleth map, can be used to show how quantitative information varies geographically. However, such a map provides us with only an evaluation of the height difference in each individual region through equivocal visual information, and it is still difficult to estimate the location of spatial clusters based on statistical evidence.
Several studies dealing with various types of spatial data have been conducted to detect spatial clusters. Besag and Newell (1991) separated cluster detection tests into two categories, focused and general. A focused test detects whether there are clusters around prespecified point sources, such as nuclear installations and incinerators. A general test targets clusters over the study area. Furthermore, general tests are classified as involving global or local statistic. A global statistic is designed to evaluate whether or not there are spatial clusters in the study area. For example, Moran’s I statistic (Moran 1948), based on spatial autocorrelation, Cuzick–Edward’s test (Cuzick and Edwards 1990), based on a kind of knearest neighbors method, and Tango’s index (Tango 1995), based on a factor of data and a measure of closeness between regions, have been proposed. A local statistic is used to detect the locations of clusters. For example, Anselin (Anselin 1995) proposed a local Moran’s I statistic that detects cluster regions from the perspective of spatial autocorrelation. Openshaw et al. (1987) and Besag and Newell (1991) attempted cluster detection using subregions based on a predetermined rule. Tango’s index (2000), extended as a local test, has also been used in the field of spatial epidemiology.
Recently, the spatial scan statistic (Kulldorff 1997) has been widely used for cluster detection together with the freely available SatScan™ software (Kulldorff 2018) and applied in such fields as astronomy, biosurveillance, natural disasters, and forestry. It is commonly used to evaluate the statistical significance of temporal and geographical clusters without requiring any prior assumptions about their location, time period, or size. In addition, some models of spatial scan statistic have been proposed, such as the Bernoulli (Kulldorff and Nagarwalla 1995), Poisson (Kulldorff 1997), ordinal (Jung et al. 2007), exponential (Huang et al. 2007), normal (Huang et al. 2009; Kulldorff et al. 2009), and multinomial (Jung et al. 2010) depending on the feature of data. These statistical approaches detect a hotspot or a coldspot cluster based on the likelihood ratio (LR) associated with the number of events inside and outside a connected regional subset, called a window. To detect a cluster with high LR, it is desirable to scan and calculate the LR for every possible window and evaluate them. However, unless the number of regions in the study area is extremely small, it is very difficult to cover all possible patterns of window, because there are a huge number of those in general. Kulldorff and Nagarwalla (1995) proposed using a circularshaped window, but it has been pointed out that a noncircularshaped cluster, such as the shape formed by a river or a road, cannot be detected by that means. To capture an arbitrary shaped cluster, several scanning techniques using noncircularshaped window have been proposed (Duczmal and Assunção 2004; Kurihara 2004; Patil and Taillie 2004; Tango and Takahashi 2005).
In this paper, we propose a novel scanning method using a zerosuppressed binary decision diagram (ZDD) (Minato 1993) from the perspective of cluster detection of truly maximum LR. The ZDD is an approach to frequent item set mining that enables us to extract all possible patterns of window at a realistic computational load. By applying this technique to cluster detection, we can exactly determine a true hotspot or coldspot cluster, that is, a spatial cluster with the truly maximum LR. Specifically, when we limit the maximum number K of regions in a cluster, we can prove that the obtained cluster has the truly maximum LR among windows with at most size K, and thus, we can say with confidence that it is waste to find a region (with at most size K) with higher LR. As we will discuss in Sect. 5, compared to the existing methods, our method is suitable for finding a spatial cluster with small size but high LR.
Section 2 defines a hotspot cluster and introduces the spatial scan statistic and several valid existing scanning methods. Section 3 describes a ZDD. In Sect. 4, we demonstrate the detection of true hotspot cluster for a simple artificial data. In addition, we try to apply the ZDD technique to a real data, that is, we strictly calculate the number of connected regional patterns and detect a true hotspot cluster for North Carolina sudden infant death syndrome (SIDS) data. Furthermore, we compare it with some hotspots obtained using existing scanning methods and discuss their suitability for different purposes of detecting hotspots. In Sect. 5, we evaluate our proposed method by clarifying the properties of each scanning method and conclude the paper with a discussion of our work.
2 Definition of hotspot cluster and methods for detecting hotspot
2.1 Test statistic
What matters here is how effectively and efficiently we scan and find a window \({\mathbf{Z}}\), whose LLR is maximum in the entire study area. It is usually impossible to conduct a complete investigation of all possible patterns of window, because their number is expected to increase explosively if the study area has a large number of regions. Kulldorff and Nagarwalla (1995) first proposed imposing a circular scanning window on the study area. This allows the center of the circle to move over each centroid of the region, and then, for each circle centroid, allows the radius of the circle to vary from zero to a previously userdefined limit. (As a default, the window never includes more than 50% of the total population.) Their method is available in SaTScan™ software (Kulldorff 2018), and it is widely used for hotspot detection in various fields. However, since this technique uses a circularshaped window to scan, it has difficulty in correctly detecting noncircularshaped hotspots, such as the shape formed by a river or a road. To detect arbitrarily shaped hotspots, several noncircular scanning techniques have been proposed. In this paper, we focus on the flexible scan (Tango and Takahashi 2005), whose software is available for free and echelon scan we have been proposed.
2.2 The flexible scan and the restricted LR
2.3 The echelon scan
Neighbor information and value for each region
Region name  Value  Neighboring regions 

A  1  B, D, E 
B  5  A, C, E 
C  2  B, E, F 
D  10  A, E, G, H 
E  3  A, B, C, D, F, H, I, J 
F  7  C, E, J 
G  8  D, H 
H  4  D, E, G, I 
I  6  E, F, H, J 
J  4  E, F, I 
Similar to as the flexible scan, we first define a maximum size of regions K to be included in the hotspot. Until the number of regions included in the window \({\mathbf{Z}}\) reaches the predefined number K, we let the scanning window move from the upper to the bottom structure of the dendrogram while incorporating the regions in the dendrogram into the window \({\mathbf{Z}}\). For example, in this sample data, if we choose 50% of the total regions as K, i.e., \(K=5\), then we can obtain the six window patterns consisting of {D}, {D, G}, {I}, {F}, {I, F, J} and {B}. In the collection of scanned windows (needless to say, each window is the connected regional subset), a window \({\mathbf{Z}}\) with the highest LLR is regarded as the hotspot.
3 ZDDs
In this section, we explain ZDDs and how to represent a huge number of windows using them. ZDDs were proposed by Minato (1993) as a compact data structure for representing a family of sets, and they have been used in such research fields as logic synthesis, symbolic model checking, and itemset mining. Recently, ZDDs have been applied to graph optimization problems, such as minimizing the loss for grid networks (Inoue et al. 2014), the longest path problem (Kawahara et al. 2017b), evacuation planning for disasters (Takizawa et al. 2013), and designing electoral systems (Kawahara et al. 2017c). The key idea of using ZDDs for graph optimization is to directly construct the ZDD representing all of the solutions of the problem and extract the optimal solution from the constructed ZDD. The method of ZDD construction is called frontierbased search (Sekine et al. 1995; Kawahara et al. 2017a). In what follows, we describe how to obtain and handle a huge number of windows using ZDDs and frontierbased search.
Frontierbased search is a method for directly constructing a ZDD representing subgraphs we would like to obtain when an input graph is given. It can treat graph structures such as paths (Knuth 2011), trees (Sekine et al. 1995), matchings (Kawahara et al. 2017a), and graph partitions (Kawahara et al. 2017b). We can impose various conditions on obtained subgraphs, such as the number of edges, the connectivity of specified vertices, the existence (or nonexistence) of a cycle, and the degrees of vertices. A detailed explanation of frontierbased search is provided in Kawahara et al. (2017a). The number of subgraphs that frontierbased search can treat is huge. For example, Kawahara et al. (2017a) reported that the method succeeded in constructing the ZDD representing \(8.32 \times 10^{33}\) spanning trees on a \(9 \times 9\) grid graph with 81 vertices in 67.1 seconds, while an existing algorithm that outputs spanning trees one by one did not finish for a \(6 \times 6\) grid graph with 36 vertices in 1000 s.
4 Illustrative example
4.1 Numerical example
Results of detected hotspot cluster and its LLR for the artificial data using each scanning method with \(K=15\)
Scanning method  Window \({\mathbf{Z}}\) identified as hotspot  \(\mathrm{LLR} \,({\mathbf{Z}})\) 

ZDDbased scan  A, C, E, F, G, H, I, J, M, T, X, Y, b  908.72 
Flexible scan  G, H, O, U, V, X, Y  456.75 
Flexible scan with restriction  A, C, G, H  394.57 
Echelon scan  A, C, G, H, I, J, K  396.68 
4.2 Application to real data
4.2.1 SIDS data in North Carolina
As an introduction to the hotspot detection of North Carolina’s SIDS data, we introduce the application reported in Kulldorff (1997). He continuously varied the radius of the circular scanning window from zero to a maximum radius with the window never including more than 50% of the total live births. We replicated his work using SaTScan™ software, with the result of the detected hotspot consisting of the counties {9,24,47,78,83} in the southern part of the state with \(\mathrm{LLR}=25.38\). This is shown in Fig. 12a. Here, we calculated the number of expected SIDS cases in county i as Eq. (4), where \(n_i\) is the number of live births in county \(i (i=1,2,\ldots ,100)\). The advantage of Kulldorff’s circular scanning method is that the power is high for a circularshaped hotspot, such as when a certain infectious disease spreads concentrically, and the calculation load is low due to the simplicity of the algorithm. However, since the shape of scanning window is restricted to a circle, it cannot always be said that a detected hotspot has the truly maximum LLR compared to other hotspots with arbitrary shape.
4.2.2 True hotspot for SIDS in North Carolina
Number of connected regional patterns consisting of k counties for North Carolina 100 counties
k  # of connected regional patterns  k  # of connected regional patterns 

1  100  51  9,539,833,615,460,317,913,312,187 
2  246  52  13,432,518,325,136,437,486,342,684 
3  795  53  18,119,645,581,242,239,769,807,495 
4  2,882  54  23,385,551,368,937,678,091,262,449 
5  11,126  55  28,837,581,658,170,132,827,248,077 
6  44,353  56  33,930,213,995,061,479,857,925,057 
7  179,312  57  38,040,538,109,687,239,817,601,494 
8  728,312  58  40,586,493,627,937,927,491,227,727 
9  2,957,129  59  41,159,605,058,904,624,057,232,443 
10  11,967,816  60  39,631,432,832,085,206,472,886,223 
11  48,193,253  61  36,195,840,257,051,144,775,781,104 
12  192,869,404  62  31,329,002,397,053,190,430,330,422 
13  766,397,399  63  25,678,347,179,424,563,245,821,654 
14  3,021,567,757  64  19,917,023,180,204,992,896,479,550 
15  11,811,058,877  65  14,610,267,344,128,376,243,769,670 
16  45,738,895,838  66  10,130,637,544,912,529,410,456,639 
17  175,317,515,044  67  6,636,682,330,452,184,688,626,379 
18  664,412,625,310  68  4,105,885,870,103,634,870,896,075 
19  2,486,475,462,576  69  2,397,828,995,361,079,543,297,569 
20  9,176,326,279,964  70  1,321,294,144,900,936,286,130,796 
21  33,347,254,902,049  71  686,686,250,588,748,314,307,035 
22  119,155,174,008,622  72  336,424,636,883,948,444,546,093 
23  418,015,761,602,764  73  155,295,419,030,501,647,079,397 
24  1,437,791,141,456,276  74  67,501,209,368,328,746,188,449 
25  4,842,450,935,754,499  75  27,608,883,376,075,994,142,565 
26  15,951,391,419,269,919  76  10,617,592,361,119,339,861,919 
27  51,339,186,679,913,789  77  3,835,685,917,868,138,195,098 
28  161,294,236,248,594,080  78  1,300,273,052,269,505,594,316 
29  494,251,533,535,716,806  79  413,104,595,626,229,131,822 
30  1,476,043,419,482,496,004  80  122,826,544,093,160,502,657 
31  4,292,821,604,575,914,676  81  34,119,660,711,160,955,328 
32  12,149,152,620,014,111,554  82  8,838,063,542,831,389,136 
33  33,432,159,868,016,049,614  83  2,129,993,157,106,413,681 
34  89,380,195,334,710,034,120  84  476,370,907,610,775,330 
35  231,958,312,855,416,232,001  85  98,573,205,377,747,577 
36  583,848,589,681,206,948,625  86  18,806,672,385,664,470 
37  1,424,119,318,718,367,585,003  87  3,294,955,724,333,584 
38  3,363,525,012,017,841,540,870  88  527,615,546,665,500 
39  7,686,172,126,658,045,104,784  89  76,788,169,869,075 
40  16,981,703,926,062,519,889,142  90  10,090,170,416,125 
41  36,251,238,041,618,809,688,056  91  1,187,576,406,944 
42  74,726,711,720,118,184,889,054  92  123,978,287,253 
43  148,663,783,172,787,140,792,478  93  11,341,618,761 
44  285,294,822,593,769,394,893,406  94  895,206,379 
45  527,879,454,591,096,181,007,485  95  59,736,856 
46  941,279,310,405,478,676,239,512  96  3,277,183 
47  1,616,664,290,712,134,585,900,134  97  141,928 
48  2,672,898,381,501,191,448,393,003  98  4,550 
49  4,251,136,234,955,695,942,282,594  99  96 
50  6,498,760,705,816,102,312,737,074  100  1 
Details of the detected hotspots
# of counties  o  n  E  RR  LLR  

Hotspot (a)  5  139  3636  72.67  2.01  25.38 
Hotspot (b)  6  156  41,851  83.61  1.97  26.79 
Hotspot (c)  7  167  44,707  89.31  1.98  29.00 
Hotspot (d)  8  180  48,541  96.97  1.97  30.81 
Hotspot (e)  10  263  79,992  159.80  1.78  31.90 
Hotspot (f)  11  287  87,806  175.41  1.79  34.54 
Hotspot (g)  14  295  89,766  179.33  1.80  36.37 
Hotspot (h)  23  443  147,269  294.20  1.71  42.08 
Hotspot (i)  18  359  111,820  223.39  1.80  42.16 
Hotspot (j)  38  748  286,161  571.67  1.61  42.62 
Hotspot (k)  41  812  316,602  632.49  1.62  43.29 
4.2.3 Hotspot detection using existing scanning methods
In this section, we will apply existing scanning methods to SIDS data. The flexible scan, in the case of the maximum cluster size of \(K=5,6,7,8,9\), detected the same result as the true hotspot in the case of \(K=5\) (Table 4a). On the other hand, under any of the settings at \(K=10,11,12,13,14,15,20\), detected the same result as the true hotspot cluster obtained at \(K=8\) or \(K=9\) (Table 4d). In case of \(K=30\), it detected the hotspot consisting of 14 counties with \(\mathrm{LLR}=36.37\) (Table 4g). Furthermore, in case of \(K=40\), the detected hotspot consisted of 18 counties with \(\mathrm{LLR}=42.16\) (Table 4i). The flexible scan has a performance comparable to the ZDDbased all possible scan in terms of detecting a high likelihood cluster; however, its calculation load becomes a problem when we want to detect a large size cluster. For applying to North Carolina’s 100 counties using FlexScan v3.1.2 software, the execution time was about 2600 s when \(K = 30\), furthermore, about 1,645,300 s (about 19 days) when \(K = 40\) (using a PC windows7 Intel(R), Core(TM) i7 CPU X990 (3.47 GHz) and 24 GB memory).
However, a question arises here: is it not a problem that the hotspot includes the county of Richmond (county number 77), which is below the average in mortality rate? (The SIDS incidence rate of Richmond (77) is 1.88 per 1000 live births.) The reason why a region with a low rate is included in the hotspot is that the spatial scan statistic, as noted previously, is modeled by maximizing the LR, and therefore, it recognizes to encompass even if the neighboring regions have a nonelevated risk. This implies that there would be a danger of a mistaken identification, that is, we might detect two originally separate hotspots as a single hotspot. Tango’s restricted scan statistic provides one solution to this problem. In applying the restricted statistic, we selected \(K=5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 30, 40\) and 50, respectively. In addition, we selected the prespecified significance levels of \(\alpha _1=0.10, 0.20, 0.30,0.40\), respectively. Irrespective of the value of K and \(\alpha _1\), the flexible scan with restriction detected the same counties as the true hotspot cluster obtained by \(K=5\). This is obviously, because the lowrisk county of Richmond (77) was removed from the scanned counties. Under all conditions we tried, the execution time of the restricted flexible scan was within 1 s.
5 Discussion
In this paper, we proposed a novel approach to detect a spatial cluster from the perspective of truly maximum LR using the ZDD technique. As an illustration, we used the SIDS data of North Carolina, which is often used as spatial data analysis. Firstly, we succeeded in strictly calculating the number of connected regional patterns consisting of 1–100 counties in North Carolina for the first time in the world. Then, we could obtain the actual regional patterns under the condition of 11 counties or less and detect the true hotspot clusters by calculating the LLR for all of them. This provides an important new insight that if we have the same data scale as introduced in this paper, we can examine a cluster with truly maximum LLR exhaustively through ZDD enumeration. In addition, we investigated the properties of existing scanning methods, including the flexible and the echelon scans. Table 5 summarizes that LLRs of detected hotspots for each method with change in the maximum cluster size K. As expected, the ZDDbased scan could always detect the hotspot with higher LLR than any other methods if we set the same K. Furthermore, in the case of \(K = 10\) and \(K = 11\), it was very interesting that the hotspots detected by ZDDbased scan, unlike previous trends, were located on the counties that lay on north and south narrowly as shown in Fig. 12e, f. These unique shaped hotspots, but the best LLR, can never be detected with the other scanning methods discussed in this paper.

To use the ZDD technique for scanning can always detect a hotspot cluster with the highest likelihood and that is ideal for small sized hotspot detection.

The original flexible scan works well in medium sized hotspot detection such as consisting of 20 regions or less, and the detected hotspot has a comparatively high likelihood.

The flexible scan with restriction and the echelon scan can detect a hotspot with arbitrary size without imposing a limitation of the maximum cluster size caused by high computational load. The former is most suitable for detecting a hotspot that does not include lowrisk regions and the latter is able to obtain a hotspot with high likelihood.
LLRs of detected hotspot for each method with change in the maximum cluster size K for SIDS in North Carolina
K  ZDDbased scan (true hotspot)  Flexible scan  Flexible scan with restriction (\(\alpha _1=0.10,0.20,0.30,0.40\))  Echelon scan 

5  25.38  25.38  25.38  25.38 
6  26.79  25.38  25.38  25.38 
7  29.00  25.38  25.38  25.38 
8  30.81  25.38  25.38  25.38 
9  30.81  25.38  25.38  25.38 
10  31.90  30.81  25.38  25.38 
11  34.54  30.81  25.38  25.38 
12  CD  30.81  25.38  25.38 
13  CD  30.81  25.38  25.38 
14  CD  30.81  25.38  25.38 
15  CD  30.81  25.38  25.38 
20  CD  30.81  25.38  25.38 
30  CD  36.37  25.38  42.08 
40  CD  42.16  25.38  42.41 
50  CD  CD  25.38  43.62 
What has to be noticed is, owing the nature of spatial scan statistic, that the LR of detected hotspot may be getting higher as a result of including a particular region “A” that combines two or more different clusters with high risk into one cluster, even if the “A” itself does not have a high risk. We need to determine carefully which one should be selected in “a single hotspot with the maximum LR” or “several separate hotspots with decent LR” by consideration of their background.
This paper discussed the hotspot cluster detection that focuses only on LR statistic, but of course the significance of the detected hotspot must be judged from the distribution of statistics. Monte Carlo hypothesis testing (Dwass 1957) is typically used to estimate p values, since it is difficult to obtain the exact distribution of the spatial scan statistic. However, we might be able to determine the truep value using all of the LR statistics calculated from every possible window obtained using ZDD. We consider this to be worthwhile future work.
Footnotes
 1.
An induced connected component is a subgraph in which every two vertices of the subgraph have an edge if the edge exists on the original graph.
 2.
We conducted this experiment on a machine with Intel Xeon E52630 (2.30 GHz) CPU and 128 GB memory (Linux Centos 6.6). We implemented the algorithm in C++ and compiled them using gcc with the O3 optimization option.
Notes
Acknowledgements
This work was partly supported by JSPS KAKENHI Grant Numbers JP16K16019, JP18K04610, JP18H04091 and JP15H05711.
References
 Anselin, L. (1995). Local indicators of spatial associationLISA. Geographic Analysis, 27(2), 93–115.CrossRefGoogle Scholar
 Besag, J. E., & Newell, J. (1991). The detection of clusters in rare diseases. Journal of the Royal Statistical Society, Series A, 154(1), 143–155.CrossRefGoogle Scholar
 Berke, O. (2004). Exploratory disease mapping: Kriging the spatial risk function from regional count data. International Journal of Health Geographics, 3(1), 18.CrossRefGoogle Scholar
 Cressie, N. (1992). Smoothing regional maps using empirical Bayes predictors. Geographical Analysis, 24(1), 75–95.MathSciNetCrossRefGoogle Scholar
 Cressie, N., & Chan, N. H. (1989). Spatial modeling of regional variables. Journal of American Statistical Association, 84, 393–401.MathSciNetCrossRefzbMATHGoogle Scholar
 Cuzick, J., & Edwards, R. (1990). Spatial clustering for inhomogeneous populations. Journal of the Royal Statistical Society, Series B, 52(1), 73–104.MathSciNetzbMATHGoogle Scholar
 Duczmal, L., & Assunção, R. (2004). A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters. Computational Statistics and Data Analysis, 45(2), 269–286.MathSciNetCrossRefzbMATHGoogle Scholar
 Dwass, M. (1957). Modified randomization tests for nonparametric hypotheses. Annals of Mathematical Statistics, 28(1), 181–187.MathSciNetCrossRefzbMATHGoogle Scholar
 Huang, L., Kulldorff, M., & Gregorio, D. (2007). A spatial scan statistic for survival data. Biometrics, 63(1), 109–118.MathSciNetCrossRefzbMATHGoogle Scholar
 Huang, L., Tiwari, R. C., Zuo, Z., Kulldorff, M., & Feuer, E. J. (2009). Weighted normal spatial scan statistic for heterogeneous population data. Journal of the American Statistical Association, 104, 886–898.MathSciNetCrossRefzbMATHGoogle Scholar
 Inoue, T., Takano, K., Watanabe, T., Kawahara, J., Yoshinaka, R., Kishimoto, A., et al. (2014). Distribution loss minimization with guaranteed error bound. IEEE Transactions on Smart Grid, 5(1), 102–111.CrossRefGoogle Scholar
 Ishioka, F., & Kurihara, K. (2012). Detection of spatial clusters using echelon scan. Proceedings of the 20th International Conference on Computational Statistics (COMPSTAT2012), Heidelberg: PhysicaVerlag, 341–352.Google Scholar
 Ishioka, F., Kurihara, K., Suito, H., Horikawa, Y., & Ono, Y. (2007). Detection of hotspots for 3dimensional spatial data and its application to environmental pollution data. Journal of Environmental Science for Sustainable Society, 1, 15–24.CrossRefGoogle Scholar
 Jung, I., Kulldorff, M., & Klassen, A. C. (2007). A spatial scan statistic for ordinal data. Statistics in Medicine, 26(7), 1594–1607.MathSciNetCrossRefGoogle Scholar
 Jung, I., Kulldorff, M., & Richard, O. J. (2010). A spatial scan statistic for multinomial data. Statistics in Medicine, 29(18), 1910–1918.MathSciNetCrossRefGoogle Scholar
 Kawahara, J., Inoue, T., Iwashita, H., & Minato, S. (2017a). Frontierbased search for enumerating all constrained subgraphs with compressed representation. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E100–A(9), 1773–1784.CrossRefGoogle Scholar
 Kawahara, J., Saitoh, T., Suzuki, H., & Yoshinaka, R. (2017b). Solving the longest onewayticket problem and enumerating letter graphs by augmenting the two representative approaches with ZDDs. In: S. PhonAmnuaisuk, T.W. Au, & S. Omar (Eds.), Computational intelligence in information systems: Proceedings of the computational intelligence in information systems conference (CIIS 2016), Cham: Springer, 294–305.Google Scholar
 Kawahara, J., Horiyama, T., Hotta, K., & Minato, S. (2017c). Generating all patterns of graph partitions within a disparity bound. In Proceedings of the 11th International Conference and Workshops on Algorithms and Computation (WALCOM2017), 119–131.Google Scholar
 Knuth, D.E. (2011). The Art of Computer Programming, Volume 4A, Combinatorial Algorithms, Part 1 (1st ed.). AddisonWesley Professional.Google Scholar
 Kulldorff, M. (1997). A spatial scan statistic. Communications in Statistics: Theory and Methods, 26(6), 1481–1496.MathSciNetCrossRefzbMATHGoogle Scholar
 Kulldorff, M., & Harvard Medical School, Boston and Information Management Services Inc. (2018). SatScan™ v9.6: Software for the Spatial and SpaceTime Scan Statistics. http://www.satscan.org/. Accessed 1 July 2018.
 Kulldorff, M., Huang, L., & Konty, K. (2009). A scan statistic for continuous data based on the normal probability model. International Journal of Health Geographics, 8, 58.CrossRefGoogle Scholar
 Kulldorff, M., & Nagarwalla, N. (1995). Spatial disease clusters: Detection and inference. Statistics in Medicine, 14(8), 799–810.CrossRefGoogle Scholar
 Kurihara, K. (2004). Classification of geospatial lattice data and their graphical representation. In D. Banks et al. (Eds), Classification, clustering, and data mining applications (pp. 251–258). New York: Springer.Google Scholar
 Lawson, A. B., & Clark, A. (2002). Spatial mixture relative risk models applied to disease mapping. Statistics in Medicine, 21(3), 359–370.CrossRefGoogle Scholar
 Minato, S. (1993). Zerosuppressed BDDs for set manipulation in combinatorial problems. In Proceedings of the 30th ACM/IEEE Design Automation Conference, 272–277.Google Scholar
 Moran, P. A. P. (1948). The interpretation of statistical maps. Journal of the Royal Statistical Society, Series B, 10(2), 243–251.MathSciNetzbMATHGoogle Scholar
 Myers, W. L., Patil, G. P., & Joly, K. (1997). Echelon approach to areas of concern in synoptic regional monitoring. Environmental and Ecological Statistics, 4(2), 131–152.CrossRefGoogle Scholar
 Openshaw, S., Charlton, M., Wymer, C., & Craft, A. (1987). A mark 1 geographical analysis machine for the automated analysis of point data sets. International Journal of Geographical Information Systems, 1(4), 335–358.CrossRefGoogle Scholar
 Patil, G. P., & Taillie, C. (2004). Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environmental and Ecological Statistics, 11(2), 183–197.MathSciNetCrossRefGoogle Scholar
 Sekine, K., Imai, H., & Tani, S. (1995). Computing the Tutte polynomial of a graph of moderate size. In Proceedings of the 6th International Symposium on Algorithms and Computation (ISAAC1995), 224–233.Google Scholar
 Takahashi, K., Yokoyama, T., & Tango, T. (2010). FleXScan v3.1.2: Software for the Flexible Scan Statistic. National Institute of Public Health Japan. https://sites.google.com/site/flexscansoftware/. Accessed 1 July 2018.
 Takizawa, A., Takechi, Y., Ohta, A., Katoh, N., Inoue, T., Horiyama, T., Kawahara, J., & Minato, S. (2013). Enumeration of region partitioning for evacuation planning based on ZDD. In 11th International Symposium on Operations Research and its Applications in Engineering, Technology and Management 2013 (ISORA 2013), Proceedings of 11th International Symposium, 1–8.Google Scholar
 Tango, T. (1995). A class of tests for detecting “general” and “focuses” clustering of rate diseases. Statistics in Medicine, 14(21–22), 2323–2334.CrossRefGoogle Scholar
 Tango, T. (2000). A test for spatial disease clustering adjusted for multiple testing. Statistics in Medicine, 19(2), 191–204.CrossRefGoogle Scholar
 Tango, T. (2008). A spatial scan statistic with a restricted likelihood ratio. Japanese Journal of Biometrics, 29(2), 75–95.CrossRefGoogle Scholar
 Tango, T., & Takahashi, K. (2005). A flexible spatial scan statistic for detecting clusters. International Journal of Health Geographics, 4, 11.CrossRefGoogle Scholar
 Tango, T., & Takahashi, K. (2012). A flexible spatial scan statistic with a restricted likelihood ratio for detecting disease clusters. Statistics in Medicine, 31(30), 4207–4218.MathSciNetCrossRefGoogle Scholar