For processing geodata there are many different approaches of which all of them require their own specific input data and parameters to generate an outcome that suits the respective case of application. This chapter introduces the most common analyses that are conducted using a GIS. From basic tools like buffering certain vector geometries or merging operations of two different datasets to interpolating area wide raster datasets out of point data there is a huge variety of different toolsets that can be applied when using geodata. To understand why and how these toolsets are utilised, how they are parametrized and which other things are important to make proper use of all the different possibilities these toolsets are providing, this chapter sums up the analyses in reasoned groups and illustrates the many different approaches of spatial analyses through proper examples and depictions.
KeywordsSpatial analysis Network analysis GIS Spatial statistics Raster resolution Geoprocessing Data conversion
3.1 Simple Spatial Analysis (by Andreas Redecker)
This sub-chapter gives an overview of fundamental GIS methods for performing basic spatial analysis with feature data. Nevertheless, depending on the data involved and the workflow incorporating these methods, they can deliver highly valuable output. The process of manipulating geodata is called geoprocessing. To automate workflows all operators that are involved in an analysis can be combined with a geoprocessing model.
In many cases, not all features of a feature class are supposed to take part in an analysis. The selection of the desired objects can be performed based on the attributes of the features or incorporating their spatial characteristics. Depending on the GIS used, these two different methods can be applied successively or in one process.
220.127.116.11 Select by Attribute
18.104.22.168 Linking Tabular Data
If the necessary properties for an attribute-based selection are not held in the attribute table of the feature class itself, they can be linked to it from external tabular data. For this, both tables need to contain a field (column) with matching entries. These must uniquely identify a feature in the attribute table as well as its corresponding data in the table to be linked.
22.214.171.124 Select by Location
3.1.2 Single Feature Class Operations
To prepare features for further analysis or to better visualise results, two major operations are available to change the structure of single feature classes.
3.1.3 Overlay Operations
These operations combine two or more feature classes to gain new geo-datasets incorporating the extent of the features involved.
This operator combines the polygon features of two or more feature classes. It does not create overlapping features. Instead, it splits overlapping parts of features to subarea features and assigns the attributes of all involved objects to the new feature.
126.96.36.199 Symmetrical Difference
The result of this function only contains those areas of the input features, that do not overlap. Hence it gives the same result as a union operation minus the result of an intersect.
3.2 Raster Analysis (by Jaroslav Burian)
Raster analysis (as part of spatial analysis) refers to the analytical operations with raster data. Map algebra (mathematical operations with rasters) is used to processing this data. There exist many raster analysis options in GIS like hydrologic analysis, multi-criteria analysis, terrain analysis, surface modelling, surface interpolation, suitability modelling, statistical analysis, and image classification (processing of remote sensing data). Most of the application fields cover environmental issues (e.g. climatic change, weather forecasting, flood modelling) but there are also some focused on economic aspects (e.g. modelling of renewable energy potential, land suitability modelling, cost-distance analysis and many others).
3.2.1 Raster Data
3.2.2 Map Algebra
3.2.3 Raster Operators
As part of the map algebra, operators and functions of mathematical language are used for data processing. Operators perform mathematical calculations with one or more raster layers. The basic type of operators are arithmetic operators (+, −, ∗, /). It is possible to add, subtract, multiply, divide, or perform the same single layer operations. In addition to arithmetic operators, there are Boolean operators (true, false), relational (greater than, smaller than or equal to), statistical (minimum, maximum, average and median), trigonometric (sine, cosine, tangent, arcsine), exponential and logarithmic.
3.2.4 Raster Functions
Tomlin (1994) classifies all GIS transformations of rasters into four basic classes, and it is used in several raster-centric GISs as the basis for their analysis languages. Depending on whether the functions work with only one raster cell or more, they are divided into local, focal, zonal, and global. Map algebra functions follow some rules (spatial resolution, the same coordinate system, mathematical operators) to combine all of its components.
3.2.5 Selected Raster Analysis
Raster operators and raster functions can be applied to many different raster datasets to perform a wide range of raster analysis. For the purpose of this book, only a few selected analysis are described.
188.8.131.52 Surface Analysis
Many raster analysis deal with the surfaces. Surfaces represent phenomena that have values at every point across their extent. Surfaces are derived from a limited set of sample values (e.g. elevation points, meteorological stations). A typical surface represents elevation, temperature, precipitation and many other continuous phenomena’s. Surfaces can be represented by contour lines, points or TINs (triangulated irregular networks); however most surface analysis in GIS is done on raster data.
The surface analysis involves several kinds of processing, including extracting new surfaces from existing surfaces, reclassifying surfaces, and combining surfaces (ESRI 2018a). The most common surface analysis (slope, aspect, hillshade, viewshed and watershed) are applied to the elevation data (terrain surfaces – digital elevation models).
184.108.40.206 Hillshade (Illumination)
220.127.116.11 Viewshed (Visibility Analysis)
18.104.22.168 Cost Distance Analysis (Least-Cost Path)
22.214.171.124 Solar Radiation (Insolation) Analysis
126.96.36.199 Multi-Criteria Analysis
3.3 Network Analysis (by Nicolai Moos)
Most people are familiar with using a navigation system, which means that they have at least once processed a basic network analysis by looking for the shortest path or fastest route to a different location from their own. Since GIS-Software is a lot more sophisticated than a common navigation system and consequently offers many more possibilities, this chapter will characterize the fundamental functions and approaches of network analyses in GIS.
Furthermore, a working network dataset needs information on directions linked to each street segment as there are one-way streets as well as streets that can be driven on in both directions. If an outcome of the analysis should deal with a time or street capacity component these are also figures that have to be included as impedances constructing a network dataset (Chang 2010).
Once the network dataset is built up, there are several different opportunities of calculations in a network analysis.
3.3.2 Optimal Routes
3.3.3 Traveling Salesman Problem
The traveling salesman problem is an issue that is not only present in classical spatial analyses, but also in various other fields, like e.g. logistics or designing products that contain several different spots which need to be connected in a specific chronological order. The most relevant aspect in this network analysis is the efficiency of a route no matter if it is a real person that is travelling or any other subject that is moving between several different locations (Curtin 2007).
3.3.4 Service Areas
3.3.5 Location-Allocation Analysis
How can we save money for transportation? Where should we place a new facility? How big is the potential area that is covered by the store? Are stores reachable for all customers in a certain amount of time? The location-allocation analysis has several approaches like minimizing impedances, number of facilities or maximize area coverage, accessibility or market shares (Chang 2010). It therefore combines the different methods of a network analysis. Each of these tasks implies the preparation of an analysis layer that can calculate the optimal location for the particular case of application.
Necessary inputs for this layer are a network dataset as well as facility locations and demand points. The facilities are split up into candidate facilities that represent the potential location of a new facility, competitor facilities that mark the existing sites of present competitors and required facilities that represent existing sites of say one’s own organization. Demand points are locations that represent the different factors that determine the grade of suitability for a new candidate facility. These can be centroids of districts or other administrative units as well as different kinds of demand profiles like accumulation of students, families or workers of a certain business. The demand points contain information like income, age, social status, etc.
3.3.6 Origin-Destination Matrices
For the creation of an origin-destination (OD) matrix it is essential to set a certain amount of starting point features as well as a certain amount of target point features that are all located within the network dataset. The analysis settings can vary between different impedances, barriers in the network, a certain point of time and other parameters that influence the result concerning the properties of the network dataset (Curtin 2007).
3.4 Spatial Statistics (by Karel Macků)
In the context of the Tobler’s first law of geography saying “Everything is related to everything else, but near things are more related than distant things”, spatial statistics is a set of exploratory techniques for describing and modelling spatial distributions, patterns, processes and relationships. This group of analyses is necessary for a deeper understanding of spatial data, which is provided with the use of statistical methods. In this chapter, the most frequently used methods of spatial statistics are briefly introduced.
Spatial statistics is a subcategory of spatial data analysis which is closely linked to mathematical statistics. Spatial statistics is a set of exploratory techniques for describing and modelling spatial distributions, patterns, processes and relationships (Bennett et al. 2017). According to Haining (2003) some of the spatial analyses include mathematical modelling where model outcomes are dependent on the spatial interaction between objects in the model, or spatial relationships of the geographical positioning of objects within the model. This statement represents the difference between simple spatial analyses and more advanced methods that approach the tasks using mathematical and statistical apparatus. Question is why any events happen on their location and not elsewhere? Is there any association with the environment? Are the events spread or clustered in any area? With proper data, these types of questions can be answered with spatial statistics.
Spatial statistics methods are based on the assumption that elements that are close to each other are also more closely related. A direct link to Tobler’s first law of geography can be observed here: “Everything is related to everything else, but near things are more related than distant things” (Tobler 1970, p. 236). Spatial statistics can also be viewed as a complementary tool to spatial data analysis – it offers a mathematical apparatus and methods for evaluating spatial information, on the other hand, stands geography or other spatial science, which formulates a hypothesis or identifies the key parameters of these spatial data (Getis 2005). In the search for a high degree of certainty, the statistical approach is always recommended.
There should be no confusion between the terms spatial statistics and geostatistics – geostatistics is one of the spatial statistics sub-disciplines and has emerged as a tool for a probability prediction of the distribution of ore deposits in the mining industry (Longley et al. 2010).
Spatial statistics include methods based on stochastic (i.e. random) nature and pattern of phenomena. These tasks can be divided into descriptive (producing essential information about a set of elements) and interference – analysis of patterns and behaviour of spatial data. This type of analyses is the subject of this chapter.
3.4.1 Pattern Analysis
At the beginning, the term ‚cluster‘should be clarified. Clustering is a global property of the spatial pattern in a dataset, measured by a single statistics (Anselin 2005). Then cluster is a group of features, whose value and/or its locations are closer together than they would be by random. The purpose of pattern analyses is to determine whether the spatial behaviour of the geographic elements follows one of the above-mentioned options and if this behaviour is somehow statistically demonstrable. Actual spatial distribution is therefore tested against one of these options. Confirming the existence of significant clusters of similar values/clusters of points near one another is one of the most common tasks. Such a task could be based only on a visual analysis of spatially visualised data; however, the use of spatial statistics underlies this estimate by numerical tests and makes it more reliable. The resulting finding helps to understand the behaviour of the observed phenomenon and to support the hypotheses that explain this behaviour. The following lines will describe selected spatial pattern analysis.
3.4.2 Point Patterns
188.8.131.52 Ripley’s K Function
The K function is one of the methods for assessing the randomness of the distribution of the set of point data. It allows seeing if the elements appear to be dispersed, clustered, or randomly distributed throughout the area of interest. The basis of this method is to monitor the occurrence frequency in a defined space – for example, the area in the distance d from each point. The K function is defined as the ratio of the number of occurrence points in the defined area (grid or defined distance d) and the expected density of points per area unit, how would it be within the random distribution of the elements (most often represented by the homogeneous Poisson process, also known as complete spatial randomness). This principle allows identification of deviations from spatially evenly distributed data (Dixon et al. 2002). If the number of observed points within a given area is higher than for a random distribution, the distribution is clustered. If the number is smaller, the distribution is dispersed (Gillan and Gonzalez).
For an example, data of position of small and medium enterprises in Olomouc region has been analyses with K function. In such a data, it is expected that companies are located in the sites that means they will be clustered within the city.
184.108.40.206 Kernel Density
Kernel smoothing methods are used to transform data from a discrete representation (geolocated points) into a continuous array. This process is particularly useful for better interpretation of spatial distinction of variables behaviour. The kernel density estimate works with localised data, which are used for the expression of the spatially smoothed estimate of the local intensity of the occurrence of objects/events. This local smoothed intensity can also be understood as the surface of the risk of occurrence of these objects/events (INSEE Eurostat 2018). The application on spatial data is based on density estimation, a function of estimating the values occurrence based on observed data (Silverman 1986).
Conceptually, a smoothly curved surface is fitted over each point. The surface value is highest at the location of the point and diminishes with increasing distance from the point (ESRI 2018c). The final surface is created by estimating the intensity at any point using the appropriate probability density function (K – kernel function). It is necessary to determine the area in which the algorithm will assess the density of the phenomenon. This sphere – so-called bandwidth, might be calculated on all input points and median distance between its centre and all input points. The bandwidth parameter essentially determines the degree of smoothing of the resulting surface. The different kernel functions can be used to make the result of density estimation different. The application on the spatial data implemented in ArcMap software uses the quartic function, which approximates to the normal distribution.
3.4.3 Spatial Autocorrelation
The previous chapter has described how clusters of point phenomena can be identified based on their location. A following task can be to identify clusters based on the location combined with the value of the observed phenomenon at the same time. Such an analysis makes possible to evaluate whether there are spatially closer elements that have similar values of the observed phenomenon and form together high or low-value clusters, or whether the elements in the space are located at random. In the natural world, we expect some influence of environment on the monitored phenomenon. For example, analysing a strong economic region concerning the GDP per capita, we naturally assume that the regions in its immediate neighbourhood will be similar, as the whole area is characterised by similar conditions. Similarly, we expect these regions to differ from other, more remote areas. To support such a claim, an analysis of spatial autocorrelation can be used.
Spatial autocorrelation is a correlation between the values of one variable, and it allows to evaluate the degree of similarity of one object with objects in its neighbourhood and comparison with more remote objects (Cliff and Ord 1973). First, it is necessary to define relations of the object with its surrounding objects, which is provided by the matrices of spatial weights. Here, the distance of objects enters as a weight for defining spatial relationships – the autocorrelation of neighbouring objects will have more importance than the autocorrelation of distant objects. If positive autocorrelation occurs, we conclude that objects with similar values are spatially located near each other, forming spatial clusters of similar values. Negative autocorrelation indicates the proximity of different values, autocorrelation around zero indicate randomity in the spatial distribution of values.
Autocorrelation can be measured by several measures – an example of them is the Moran’s I or Geary’s criterion. Positive index value indicates a positive autocorrelation, and negative values represent negative autocorrelation. These indicators, however, measure autocorrelation only at the global level, that is the whole area of interest. If the result of these tests come out positively, it makes sense to ask how the autocorrelation varies in the space. A local test – LISA (Local Indicators of Spatial Association) serves for this task. Since the method of identifying spatial autocorrelation is based on traditional statistical methods, the calculation is complemented by the statistical significance, represented by p-value. This makes it possible to assess whether the result obtained is statistically significant or not.
The initial analysis of autocorrelation reveals spatial dependence, so it is known that clusters of high and low values occur in the area of interest. Local Moran’s I can be visualised to identify these areas. However, it is still unknown whether the high value of autocorrelation means clustering of high or low values. For a deeper understanding of the phenomenon, it is possible to visualise the observed variable depending on the average value in its surroundings – this is presented by Moran’s plot (Anselin 1996).
Similar output as provided by LISA is available also with Getis – Ord G∗. The main difference is that for LISA, the value of the feature being analysed is not included in that analysis, only neighbouring values are. Alternatively, when the local analysis is being done with Getis-Ord Gi∗, the value of each feature is included in its analysis (Getis and Ord 2010). The local sum for a feature and its neighbours is compared proportionally to the sum of all features; when the local sum is very different from the expected local sum, and when that difference is too large to be the result of random chance, a statistically significant z-score results (ESRI 2018d).
The output of this indicator is the so-called z-score for each analysed object. The higher (positive) the z-score value, the higher the intensity of clustering of high values in the area (so-called hotspot), and vice versa – the smaller (negative) the z-score is, the higher the intensity of clustering with a cold spot.
An example demonstrating the use of spatial autocorrelation methods is described in the analysis of the economically strongest and also the weakest regions in Europe. The monitored variable is GDP, the spatial unit is NUTS 3 regions.
Now user can state that regarding the spatial distribution of GDP, there is a great cluster of low values in the eastern European and several small clusters of high value in the central Europe, Sweden and UK. In the rest of the area of interest, the GDP value has a random distribution without statistically significant patterns.
As mentioned in the introduction to the chapter, the term spatial statistics is often confused with the term geostatistics. In the narrower sense, geostatistics is used only to define a set of interpolation algorithms – algorithms used to estimate the values of the continuous phenomenon or its intensity in any location of the controlled area where no measurements have been made. The continuous character is typical of environmental phenomena such as temperature, air pressure or soil concentration. In the context of economic data, there would be a lack of applications, so this topic will not be further discussed.
- Albrecht, J. (2005). Geographic information science. http://www.geography.hunter.cuny.edu/~jochen/gtech361/.
- Anselin, L. (1995). Local indicators of spatial association – LISA. Geographical Analysis, 27, 93–115. https://doi.org/10.1111/j.1538-4632.1995.tb00338.x.CrossRefGoogle Scholar
- Anselin, L. (1996). The Moran scatterplot as an ESDA tool to assess local instability in spatial association. In Spatial analytical perspectives on GIS (pp. 111–125). London: Taylor and Francis.Google Scholar
- Anselin, L. (2005). Spatial statistical modeling in a GIS environment. In D. J. Maguire, M. Batty, & M. Goodchild (Eds.), GIS, spatial analysis, and modeling (1st ed.). ESRI Press.Google Scholar
- Bennett, L., Vale, F., D’Acosta, J. (2017). Spatial statistics: Simple ways to do more with your data. https://www.esri.com/arcgis-blog/products/product/analytics/spatial-statistics-resources/
- Chang, K. T. (2010). Introduction to geographic information systems (5th ed.). New York: McGraw-Hill.Google Scholar
- Cliff A. D., & Ord J. K. (1973). Spatial autocorrelation. London: Pion Ltd.Google Scholar
- Curtin, K. (2007). Network analysis in geographic information science: Review, assessment, and projections. In Cartography and geographic information science (Vol. 34, pp. 103–111). London: Taylor and Francis.Google Scholar
- DeMers, M. (2008). Fundamentals of geographic information systems (4th ed.). Hoboken: Wiley.Google Scholar
- Dixon, P. M., El-shaarawi, A. H., & Piegorsch, W. W. (2002). Ripley’s K function. Encyclopedia of Environmetrics, 3, 1796–1803.Google Scholar
- ESRI. (2018a). ArcGIS Desktop Help. http://desktop.arcgis.com/en/arcmap/10.3
- ESRI. (2018b). What is a network dataset? https://desktop.arcgis.com/en/arcmap/latest/extensions/network-analyst/what-is-a-network-dataset.htm. Accessed 28 Dec 2018.
- ESRI. (2018c). How Kernel density works—Help | ArcGIS desktop. http://pro.arcgis.com/en/pro-app/tool-reference/spatial-analyst/how-kernel-density-works.htm. Accessed 22 Nov 2018.
- ESRI. (2018d). How hot spot analysis (Getis-Ord Gi∗) works. http://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/h-how-hot-spot-analysis-getis-ord-gi-spatial-stati.htm. Accessed 23 Oct 2018.
- Getis, A. (2005). Spatial statistics. In New developments in geographical information systems: Principles, techniques, management and applications (2nd ed., pp. 239–252). New York: Wiley.Google Scholar
- Getis, A., & Ord, J. K. (2010). The analysis of spatial association by use of distance statistics. Geographical Analysis, 24, 189–206. https://doi.org/10.1111/j.1538-4632.1992.tb00261.x.CrossRefGoogle Scholar
- Gillan, J., Gonzalez, L. Ripley’s K Function and pair correlation function. http://wiki.landscapetoolbox.org/doku.php/spatial_analysis_methods:ripley_s_k_and_pair_correlation_function. Accessed 22 Oct 2018.
- INSEE Eurostat. (2018). Handbook of spatial analysis.Google Scholar
- Longley PA, Goodchild MF, Maguire DJ, Rhind DW (1999) Geographical information systems: Principles and technical issues.Google Scholar
- Longley, P. A., Goodchild, M., Maguire, D. J., & Rhind, D. W. (2010). Geographic information: Systems and science (3rd ed.). Wiley Publishing.Google Scholar
- Silverman, B. W. (1986). Density estimation for statistics and data analysis. Monogr Stat Appl Probab.Google Scholar
- Tomlin, D., & Berry, J. K. (1979). A mathematical structure for cartographic modeling in environmental analysis. In Proceedings of the 39th symposium of the American conference on surveying and mapping.Google Scholar
- w3schools.com. (2018) Retrieved December 30, 2018, from, https://www.w3schools.com/sql/sql_where.asp
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.