3.1 Introduction

Change detection is an image processing technique that implies the availability of images acquired at sufficiently different dates and the ability to detect all the significant changes that occurred between two of these dates. Change detection using Earth Observation (EO) data for any specific application induces some additional constraints on the general statement above, specifically in two ways.

  • Temporal resolution, i.e. the largest time difference between two acquired images that can still be considered “enough”, as per the previous definition, to detect a change in one or more elements of the urban area. Temporal resolution is defined by the airborne or spaceborne EO sensor used, but also by the specific change that is the focus of the analysis.

  • Spatial resolution, i.e. the level of geographical details required to be able to detect a change that happened, obviously related to the geographical scale of the changing event, the size of the change results that we may want to observe, and the “significance” of these details with respect to the specific urban structures whose changes we want to track. Once again, a suitable (range of) spatial resolution depends on the sensors but also on the change to be tracked.

Specifying to change detection in urban areas, therefore, requires the definition of the spatial scale of the change (by means, for instance, of the size of the objects that are affected by the change) as well as the temporal scale of the same change. Specifically, for the spatial scale the change may be considered at the scale of the whole urban area, at the block or road/infrastructure network scale, at the building/road element scale. Roughly speaking, these scales correspond to spatial resolutions in the range between 100 and 10 m, from 10 to 5 m, from 2.5 to less than 1 m. Temporal scales correspond to a wide range of situations, too. Trend analysis for urban areas may require temporal samples for various years, a few months or even a few days/hours, depending on the structure that is monitored (e.g., urban extents, land uses and road traffic, respectively).

3.2 Urban Changes at Different Spatial and Temporal Scales

There is a clear connection between spatial and spectral resolution of EO data and the mapping task they can be used for. As graphically represented in Fig. 3.1, different sensors and data sets support (very) different mapping tasks in urban areas. For instance, for urban extent mapping purposes (from local up to the global scale), single-band optical and/or radar data are enough, while building outlines may be recognized only in VHR data sets. Please note that in Fig. 3.1very high spatial resolutionmeans a pixel posting of 1 m or less, “high resolutionfrom 3 to 5 m, “medium resolutionfrom 10 to 100 m, andmoderate resolutionmore than 100 m (typically 250, 500 or 1000 m).

Fig. 3.1
figure 1

Graphical representation of urban mapping task related to the spectral and spatial resolutions of the EO data sets at hand

By adding the temporal dimension to the graph in Fig. 3.1, additional options arise, according to the temporal behavior of the phenomenon to be investigated. Sudden events, like natural disasters, require fast sampling during selected time period, while long-term events, like urbanization, or land use changes, can be monitored with a less dense time sequence of EO data sets, and with the correspodingly useful spatial resolution (e.g., single band medium resolution and multi-band high resolution, according to the figure above).

The combination of spectral, spatial and temporal requirements determines in turn some constraints on the data sets and the algorithms that can be used. In this chapter, after a preliminary survey of the technical literature on urban change detection, we focus on a couple of examples, i.e. site-specific sudden change detection mapping and long-term city-wide trend analysis. In the following, therefore, first a brief overview of the most common approaches presented in technical literature for change detection for site-specific (or “hotspots”) and long-term trend analyses is offered. Then, some specific examples of processing chains and algorithmic solutions to the these two problems in urban areas exploiting their specific spatial and temporal scales are introduced and discussed.

3.2.1 Hotspot Change Detection

Hotspot monitoring is an application of urban change detection mainly devoted to the characterization of short-term changes in well-defined areas. It is very similar, in this sense, to pollution detection or fire detection or deforestation detection. As a matter of fact, it is usually an unsupervised change detection problem, where we are interested more to know that a change took place than to understand what change happened. Sometimes the amount of change is enough to provide the user with a reasonable interpretation (e.g., the amount of damages after a disaster can be used to infer areas in need of more help). Sometimes, instead, there is a requirement of a more precise classification of the changes, and the above mentioned assumption is no longer true. In any case, if no particular requirement is posed on the changes are to be detected, pixel-based or parcel-based unsupervised techniques comparison methods are enough to reveal extent and location of changes in the observed area. This is especially true when using medium resolution satellites, which provide low-cost data that may be co-registered and corrected using very standard techniques already available in common off the shelf (COTS) software. Indeed, even for sudden changes medium resolution sensors (like those on board of Landsat-8 and in the future on the Sentinel missions by ESA) are good enough to detect changes in urban areas under surveillance. Instead, while the analysis of the actual change requires very high resolution (VHR) imagery. As a matter of fact, when the focus is on particular locations and urban (infra)structures, VHR data are mandatory. In this case more specific area surveillance techniques, suited to the task of this particular target detection, can provide direct and better results. This is the reason why in this section we treat both “generic” unsupervised change detection approaches and area surveillance methods.

Unsupervised change detection may be obtained by very simple combinations of the raw images at two dates. The combination depends on the sensor characteristics, and is usually prone to errors due to misregistration and miscalibration. Basic methods comprise differencing or ratioing (the latter is better if SAR data is used) as in Rignot and van Zyl (1993). Alternatively, indexes may be extracted from data, like the Normalized Difference Vegetation Index (NDVI). Similarly, in Zha et al. (2003) a new index is proposed, namely the Normalized Difference Built-up Index (NDBI), which seems to be a better descriptor than NDVI for urban areas. This index exploits the fact that urban areas and barren soil shows a far larger increment in digital number values from band 4 to band 5 (Medium Infrared) for Landsat TM. A comparison with Max Likelihood results shows that NDBI provides better results. A slightly more complex method is Change Vector Analysis (CVA), since it allows tracking how the change affected each band, and thus recognizing what happened. In Johnson and Kasischke (1998) CVA is applied to some examples, among which the monitoring or urban expansion near Seattle. A maximum likelihood classification allows then extracting information on the nature of the change. Finally, in Grey et al. (2003) a very interesting analysis of multi-temporal SAR sequence using interferometric measures is discussed. The paper shows that it is possible to detect changes in built areas using satellite SAR data and differencing the coherence between SAR images. Results are validated against GIS land map layers in the UK Cardiff area.

Even if these simple methods may be effective, they usually require thresholds, and this may be a subjective matter, unless some automatic or semi-automatic approach is developed. This has been done in Bruzzone and Fernandez Prieto (2000a), where the Bayesian theory is used to automatically determine the correct threshold to be applied to a difference image. In particular, this image is analyzed by considering the spatial-contextual information included in the pixel neighborhood, relying on Markov Random Fields (MRF’s) to exploit inter-pixel class dependency contexts. An iterative method based on the Expectation-Maximization (EM) algorithm is used to estimate the statistical terms that characterize the distributions of the changed and unchanged pixels in the difference image. The authors report to have made experiments on both satellite and airborne multi-spectral data: results appear to be good, and the robustness of the algorithm against noise is highlighted. An extension of this work is presented in Bruzzone and Fernandez Prieto (2000b), where a more application-oriented tool for monitoring land-cover changes is proposed. The proposed technique relies on the definition of the unsupervised change-detection problem in terms of the Bayes rule for minimum cost (BRMC), which in turn allows the generation of change-detection maps in which the more critical type of error is minimized according to end-user requirements.

A different unsupervised but equally adaptive techniques is proposed in Bruzzone and Serpico (1997), the Selective use of multi-spectral Information (SMI). In this approach, even if a land cover change may be visible only in some spectral bands, all the bands are considered. Those where the change is not detectable are used to identify pixels affected by registration noise and pixels belonging to other non-investigated changes. In Bruzzone and Fernandez Prieto (2000c) a technique based on “adaptive parcels” (small homogeneous regions shared by both original images) is presented. The adaptive nature of parcels allows spatial-contextual information to be exploited so that noise may be reduced without damaging the boundaries of changed areas. In addition, the characterization of parcels with a set of different features allows identifying different land cover changes. In Bruzzone and Fernandez Prieto (2002) an adaptive semi-parametric technique for the unsupervised estimation of the statistical terms associated with the gray levels of changed and unchanged pixels in a difference image is presented. Statistical estimation and spatial/contextual information are jointly considered to generate the change map. Similarly, in Kasetkasem and Varshney (2002) the authors exploit the spatial correlation between adjacent pixels using Markov Random Fields. They find that the method is particularly robust against noise and misregistration. Experiments are on simulated an actual images of the San Francisco Bay area.

Finally, some very specific methods have been proposed in technical literature for area surveillance. One of them is Carlotto (1997), where methods for modeling and detecting general patterns of change associated with construction and other kinds of activities that can be observed in remotely sensed imagery are presented. They include a new nonlinear prediction technique for measuring changes between images and temporal segmentation and filtering techniques for analyzing patterns of change over time. Another, very interesting, example is Hazel (2001). Here objects are first extracted from each image to be analyzed, and a site model is built; then, site models extracted from different images are compared and the differences highlighted. Besides robustness against misregistration, this method provides higher-level information and potentially allows some degree of scene understanding; moreover, addition of more imagery helps to perfect the model and thus tends to improve detection results. Finally, Smits and Meyer (2000) investigate a method that can be used to characterize and understand the spatial behavior of change by decomposing the change intensity image into a tree of entities called echelons. Such a tree can be extremely helpful in discovering connections between changes.

3.2.2 Detailed Urban Change Information from Long Temporal EO Series

For medium and long-term urban change detection, the temporal sampling of polar orbit satellites with environmental and scientific mission is almost ideal. The possibility to acquire one image per month on a long time span, which is Landsat legacy, is indeed invaluable in this respect, and the analysis can be complemented using SAR data from ESA satellites (ERS-1 and 2 and ENVISAT), as well as those operated by JAXA and the Canadian Space Agency (the RADARSAT family). Following well-established techniques in EO data interpretation technical literature, when more data sets are available over the same area at different dates, even not from the same satellite, a change detection map can be drawn by cross-checking the land use maps obtained at different dates. This operation, however, may not always be possible, because land covers easily classified in one image type (e.g., optical images) are sometimes very difficult to extract from another type (e.g., radar images), and vice versa. Another way of accomplishing the same task, but mostly in case there are more data sets from the same sensor/satellite, is that of directly classifying multi-temporal images.

The very basic approach (Madhavan et al. 2001) corresponds to a previous classification step and a pixel by pixel post-classification comparison. A slightly different approach is proposed in Clapham (2003), relying on continuum-based classification, i.e. classification maps based on variables that assume values in a continuum, like percent impervious land surface and percent canopy cover. This allows better understanding each change, but requires a final step to assign changes to classes again. A post-classification change detector tailored to thebuilt-upclass may be found in Zhang (2001). It uses multi-spectral (Landsat TM) together with panchromatic optical satellite data (SPOT pan), and performs a heavy post-classification processing in order to improve the accuracy, especially for the “built-up” class. The processing is based on three steps: a co-occurrence matrix-based filtering for separating buildings from noise, an axis-oriented linking and segmentation for a complete extraction of urban and water areas, and finally mathematical morphology operations for improving the classified green areas. The differences are detected by comparison with the same results on earlier data; the authors report an accuracy of about 86 % on detection of new buildings and state that big buildings (10–20 m) can be individually detected. Finally, there are papers such as Xiuwan (2002) and Sunar (1998) comparing different methods vs. Post-classification comparison, showing its problems but also its strength with respect to unsupervised techniques. In Xiuwan (2002), for instance, the authors provide a comparison of many methods, rather than a single method. Post-classification is used, and the importance of ancillary data (possibly integrated into a GIS) is stressed. Emphasis is put, like for other post-classification methods, on improving single date classification performance.

When the temporal sequence come from the same sensor/satellite, urban land use change can be detected and classified by means of a supervised classification of multi-temporal data, either original raw data or transformed ones. A first example is Li and Yeh (1998), where the authors use principal component analysis of optical Landsat multi-temporal images to overcome the problems related to obtaining from-to class information. Tested on urban areas in Dongguan, close to Hong Kong, the method has showed superior performance with respect to conventional post-classification comparison, both in terms of accuracy and of limited overestimation of land use change. In Seto et al. (2002), Tasseled Caps is instead used, since it provides 3 bands virtually independent from the observed scene. After normalization and transformation, data are classified using a Bayesian supervised classifier and a hierarchical approach that leads directly to 9 change classes. A final image segmentation approach is used to discard “salt-and pepper” noise in the final classification map. The accuracy is very precisely assessed and provides very good results for most of the change classes, showing that the method is reliable and precise.

A very interesting point of view has been added to this series of works by Smits and Annoni (2000), where the user requirement are explicitly introduced in the change detection chain. The first point raised by the paper is that region-based change detection is usually required by the final user, while the above mentioned methods are mainly pixel-based. Second, a cost function taking into account the user requirements is often the key for a successful acceptance of the final change map. This consideration leads us quite naturally to the topic of the integration of GIS and remote sensing data. An interface to and from GIS layer is usually essential for providing information that is valuable for final users, especially in urban areas. This may lead to a direct comparison of one date classification to a GIS layer, like in Prol-Ledesma et al. (2002), or to drive the classification by means of the already considered GIS layer (Jansen and Molenaar 1995; Smits and Annoni 1999; Smit and Fuller 2001).

As an algorithmic note to this overview, most of the examples discussed above deal with the comparison of a couple of images at a time. This is the most direct definition of change detection and can be enough to adapt to the different time scales of different events in urban areas. However, the availability of more and more data sets and the possibility to model the temporal process using information extracted from images for more dates has been also explored (Almeida et al. 2003; Xia and Yeh 2000).

3.3 Sudden Change Detection in SAR Images Using “ad hoc” Indexes

One of the most important problems related to urban change detection is the recognition of sudden changes that may happen due to a natural or human-induced disaster, but also because of a specific change in a given site, as mentioned above. Without extensively treating the problem of damage assessment after earthquakes (which is thoroughly analyzed in Dell’Acqua and Gamba 2012), we would like to focus in the following on sudden detection in case of natural/man-made disasters using “ad hoc” indexes. In this section, therefore, the focus is more on the spatial scale and resolution, while the punctual nature of the event requires images that are as temporally close as possible to it.

As a matter of fact, it is usually expected that a sudden change induces an extensive change in the appearance of the objects in EO data, but this is not always the case. Changes depend on the phenomenon, on the spatial resolution and on the type of sensor providing the data. Sometimes, for instance, it is easier to spot a change in the pattern of more buildings than the change on a building-by-building comparison. This is particularly true when dealing with SAR data, where the combination of materials and geometry defines the backscattering pattern in a given area. Therefore, although it is recognized that change detection application require the most detailed imagery available, the best way to process these data and achieve a (semi) automatic detection of the change is not always corresponding to an analysis at the pixel level.

For SAR imagery related to urban area, for instance, it has been demonstrated in Gamba et al. (2011) that the use of textural feature is helpful to extract urban extents, and specifically a combination of different scales of texture is important. In that work it was highlighted that the best result to extract extents from high resolution and very high resolution SAR imagery is obtained when combining c-occurrence matrix textural features (Haralick et al. 1973) and Local Indicator of Spatial Association (LISA, Sokhal and Thompson 2006) features. The latter ones, in particular, describe very well the chessboard patterns that may appear in urban area due to the intertwines high backscattering and mirror-like reflectance phenomenon due to the simultaneous presence of buildings (corner reflectors) and road (flat surfaces). The same methodologies can be used for change detection, but with different.

On the last point the most recent results include an analysis of LISA indicators for building level change detection, when very local damages occur due to peculiar and spatially limited event. LISA indicators (the Moran index, the Geary index and the Getis-Ord index) describe together the positive and negative autocorrelation effects of SAR backscatters in urban areas. Specifically, Moran’s index evaluates the similarity between the neighbors of a pixel by comparing its value with the average local value, and describes local homogeneity. On the contrary, Geary’s index identifies areas of high contrast, providing a measure of local dissimilarity, while Getis-Ord index \( {G}_i \) is useful to identify “outliers”, i.e. values very different from the surroundings. The latter is represented by means of the following formula

$$ {G}_i=\frac{\sum {w}_{ij}{x}_j}{\sum {x}_j} $$
(3.1)

where \( {x}_i \) is the generic pixel value at the i-th position, and \( {w}_{ij} \) are the elements of a weight matrix, here set to either ‘0’ or ‘1’ according the so called “Rook’s case”.

The examples considered here refer to an event occurred on 7th June, 2012 in Conversano, south Italy, where a gas explosion caused the complete collapse of a three-story building in the densely built-up area of the village center. The damage inflicted to the surrounding buildings was, though, hardly visible to an external observer as noticed in the pictures published on the news. In this case the damage was assumed to be visible in very high resolution radar images, due to the substantial shape change of the target, but very complex to detect, because of the surrounding environment. Moreover, the limited spatial extent of the damage would ease fair comparisons between backscattering patterns of damaged and non-damaged buildings in real-world cases. By considering the above mentioned Getis-Ord LISA indicator, and looking at the patterns averaged on the building shape for the area, it is possible to obtain a first interpretation of a TerraSAR-X scene covering the whole village. The quantitative results are shown in Fig. 3.2, where the damaged building has a distinct behavior, different from most of the other undamaged ones. By using this indicator, however, some false positive could also show up, as visible in the figure, but the majority of the area will reveal the lack of any change.

Fig. 3.2
figure 2

Getis-Ord patterns for damaged and undamaged building in the Conversano (Italy) event using the LISA Getis-Ord textural index

3.4 “Hypertemporal” Sequences

The second example of urban change detection in this chapter refers to a very different temporal scale than the one discussed in the previous section, and also to a different target with respect to the required spatial resolution to match its extents. Specifically, the idea is to classify the temporal behavior of urban areas (as a whole, or at the block level) by considering a sequence of datasets covering the same area. Since the temporal datasets available tends to be made by long sequence of information from the same sensors, which may be considered as bands of the same image, this idea has been introduced in Gamba et al. 2008, and these sequence have been labeled as “hypertemporal”, by similarity with hyperspectral imaging, where a lot of bands help to discriminate targets via their spectral behaviors on several tens of wavelengths. Examples of the same idea already existing in technical literature (in addition to those mentioned by the end of the hotspot section) are Yang and Lo (2002) and Masek et al. (2000), where respectively the land use/land cover change data of Atlanta metropolitan area over 25 years have been extracted, by using a time series of Landsat MSS and TM images, and the dynamics of urban growth in the Washington DC metropolitan area in the period from 1973 to 1996 have been studied from Landsat observations.

The main idea, following Gamba et al. (2008), is that a hypertemporal data series X n , \( n\in \left\{1,\kern0.5em \dots,\;N\right\} \) is a long enough sequence of consistent data sets directly from the sensor or obtained by processing EO data, with \( N\,\gg\,1 \). For each of the pixel with geographical position (i, j), X n (i, j) has the meaning of its temporal. In order to be able to work on these sequences, a few pre-processing steps must be guaranteed:

  • first of all, since the “bands” of this hyperspectral image are not acquired simultaneously, their alignment must be ensured by means of a multitemporal co-registration;

  • second, the data at different dates is temporally correlated in some portions of it, both in the spatial and in the temporal terms; they can (and must) be subject to de-noising and/or feature reduction steps. In Gamba et al. (2008), for instance, a multitemporal speckle filter like in Quegan and Yu (2001) was considered, while in case of an optical sequence techniques like Principal Component Analysis of Minimum Noise Fractions may be considered more suitable.

A graphical representation of a typical processing procedure for a hypertemporal data set (very similar to the one for an hyperspectral image) is thus proposed in Fig. 3.3.

Fig. 3.3
figure 3

Graphical representation of the main processing steps of the hypertemporal data processing chain

Although the procedure in the previous figure refers to an analysis at the pixel level, the same approach may be considered at a very different spatial scale by considering the possibility to include information about “objects” in the target area, whose changes are more relevant than those detected at the pixel scale. Moreover, in a per-segment approach, there is no need for a precise denoising of the hyper-temporal sequence, since the same effect could be achieved by means of the spatial analysis (e.g., a simple spatial average). If a segmentation map is available, the first step of the procedure should be to substitute pixel values with segment values for any of the bands of the hyper-temporal sequence. Please note however that the significance of these measures for each segment would be best where boundaries have been extracted in accordance to some consistent segmentation procedure based on the original EO data itself. If the boundary information is obtained from an independently extracted GIS layer, the analysis at the segment level may be less significant.

An example considering a sequence of SAR images over the town of Pavia, Italy, and using a GIS layer determining boundaries between homogeneous portions of the town with respect to building types and land uses, is proposed in Fig. 3.4.

Fig. 3.4
figure 4

Per-segment analysis of a hypertemporal ERS SAR data sequence for Pavia (Italy), from 1996 to 2004. The different temporal evolution of the mean backscattered value for a few segments depicted in the bottom graph reflects the substantially stable situation in the town, with the exception of segment n. 11

Specifically, in this case a sequence of 7 SAR images collected by the ERS-1 and 2 satellites in a time range of 8 years, between 1996 and 2004, is considered. After a precise co-registration, the spatial segmentation of the sequence and thus the per-segment index computation is obtained by considering the above mentioned GIS of the test area. In Fig. 3.4 the different segments are highlighted by different colors, and their temporal pattern is depicted by using the same color, to visually show the match. As for the index, a simple per-segment average of the backscattered value is used as a way to analyze the temporal behavior of each portion of the town. This index strictly related with the presence of strong scattering (double bounce) effects, which in turn are highly correlated in urban areas with buildings and other artificial built-up structures. As clearly visible, the index shows a substantial stability, and no actual change can be detected, apart from the sudden drop in the average value for all segments due to the flooding event in 1998. As for the general trend, it denotes in general a slight decrease, but with the exception of segment 11, whose tendency to increasing values from 1999 on is due to the construction of the new site for the Engineering School of the University.

A different example for this hypertemporal sequence framework is the classification of urban evolution patterns using Landsat data. The idea is that Landsat provides a continuous and interesting monitoring of land use patterns, and can be used to monitor, at the global level, urbanization and its change in time. Besides the variability in different geographical areas, here we focus on the possibility to extract different time behaviors, discriminating areas that, for instance, became urbanized at different dates. The results shown in the following are obtained therefore by means of a processing chain composed of the following two steps:

  1. 1.

    urban extent extraction at different dates using the Normalized Difference Spectral Vector (NDSV);

  2. 2.

    unsupervised multitemporal classification of the stack of urban extent maps at different dates to discriminate among different temporal patterns.

The first step follows the approach proposed in Trianni and Angiuli (2013). The NDSV has been proposed as a mean to group existing normalized different indexes (such as the normalized difference vegetation index – NDVI, the normalized difference water index – NDWI, and the normalized difference Built-up index – NDBI). The idea is to include in one single vector all the possible normalized indexes that can be computed starting from a Landsat 5 or 7 image, considering therefore 6 bands and 15 possible combinations (the dual ones are not considered as their result is the same with opposite sign). As a result, each pixel is characterized by a set of values that have been at this point “labeled” only partially, and whose full potential is still to be explored. By looking at a standard calibrated Landsat scene, it can be demonstrated that urban areas exhibit a NDSV spectral signature that is basically “flat” across all indexes, and can be discriminated from other classes by their distinct behavior in this new “multispectral” 15-dimensional space. The NDSV can be therefore well exploited to extract human settlement extents, by applying a classifier such as the Spectral Angle Mapper classifier in the original version, that captures the differences in multispectral vectors and is robust with respect to difference in illumination. Due to the unavailability of this classifier in the analysis framework that was used for the implementation, and according to our experience, we considered instead, with very similar results, a Classification and Regression Trees (CART) supervised procedure.

The second step instead exploits a very basic K-means or ISODATA unsupervised classifier. The input of the classifier is the multi-temporal stack of human settlement extents extracted from the previous step, and thus a set of binary (urban/non-urban) images where each pixel reflects the pattern of urbanization in the corresponding scene portion. To avoid or at least reduce as much as possible errors in classification, the training set of the previously described NDSV classification procedure are selected by jointly considering the first and the last images in the stack, to ensure a selection of areas that belong to the same land use along the whole sequence. Moreover, since misclassifications are expected, the assumption that urban areas tend to become bigger and do not shrink is considered. Although not completely accurate in time, for the time period considered by our satellite archives, corresponding to one of the biggest push into urbanization of the human history, this is a quite reasonable assumption. In terms of our algorithms, it translates into a set of masking operation implemented on the time sequence of extracted human settlement extents maps in reverse order. In other words, the extents at one date are constrained by those obtained at a later date and cannot extend beyond them.

The overall chain is implemented in Google Earth Engine (2013), which provides a powerful and very flexible platform to analyze multiple remote sensing data sets, including the whole collection of NASA/USGS Landsat imagery, by being able to run processing steps on Google’s dedicated cloud storage and computational hardware.

Results for the city of Sao Paulo by means of the implemented version of the above mentioned algorithm are proposed in Fig. 3.5, and correspond to a sequence of 36 Landsat-5 images from 1992 to 2010, selected every other year with the criterion of a cloud cover of less than 1 % of the scene, and combined into yearly composites. Note that the area in purple in the final multi-temporal classification corresponds to the core part of the city, which did not change during the time period of this analysis, while the other shades, from purple to blue, highlight different time behaviors. Specifically, the areas in the center of Sao Paulo show changes in the typology of built-up elements, that in turn show up as different colored patterns in the hypertemporal combined map.

Fig. 3.5
figure 5

Analysis of a hypertemporal sequence of Landsat data for the city of Sao Paulo (Brazil), from 1992 (left top image) to 2010 (right top image). The different temporal patterns for the urbanization obtained from a stack of 36 images are shown with different colors in the bottom map (see the text for a more detailed explanation and analysis)

3.5 Conclusions

The techniques and examples presented in this chapter may be summarized, according to the authors’ vision, in the following points:

  • Spatial and temporal scales are, in general, equally important in urban area monitoring. Although this statement is somehow a trivial one, it is still not always considered. Examples may be studies working on damage extraction using image with a wrong spatial/temporal sampling, or change detection approaches not matching the temporal scale of the event.

  • In this sense, a second important point is that the selection of relevant scales is problem-dependent. This is a less trivial sentence, and data processing algorithms should consider it as well. A “change detection” technique not always fits the temporal change, although it may fits the same data for a different detection problem. An example is urban sprawl monitoring using SAR coherence, which makes sense because of the long time span of this change, while the same approach applied to urban hotspots would not work and would require additional steps like a dedicated spatial filtering routine.

  • A definitely more interesting outcome of the researches discussed in this chapter is that the combined use of relevant spatial and temporal scales in urban areas is a feature-dependent (object-dependent) problem. According to the application, different features/objects are affected by the change to be detected, and the selection of the target object includes a priority on the spatial and temporal scale to be considered.

  • The most important challenge to be met by using EO data in urban area to capture their changes and monitor at different scales their evolution is thus that they do not capture enough scales to be useful by themselves. The most important aspect is that some a priori or associated information is required, and some sort of fusion at the information or decision level is required. For instance, damage mapping at the block level proved (Dell’Acqua and Polli 2011) to be more effective when information about where damaged areas may happen is included at the global city scale, usually obtained by running vulnerability/exposure models of the area.