Spatial prediction and spatial dependence monitoring on georeferenced data streams

  • Antonio BalzanellaEmail author
  • Antonio Irpino
Original Paper


This paper deals with the analysis of data streams recorded by georeferenced sensors. We focus on the problem of measuring the spatial dependence among the observations recorded over time and with the prediction of the data distribution, where no sensor record is available. The proposed strategy consists of two main steps: an online step summarizes the incoming data records by histograms; an offline step performs the measurement of the spatial dependence and the spatial prediction. The main novelties are the introduction of the variogram and the kriging for histogram data. Through these new tools we can monitor the spatial dependence and to perform the prediction starting from histogram data, rather than from sensor records. The effectiveness of the proposal is evaluated on real and simulated data.


Data stream mining Histogram data Variogram Kriging predictor 



  1. Aggarwal CC, Han J, Wang J, Yu P (2003) CluStream: a framework for clustering evolving data streams. In: Very large data basesGoogle Scholar
  2. Agueh M, Carlier G (2011) Barycenters in the Wasserstein space. Soc Ind Appl Math 43:904–924MathSciNetzbMATHGoogle Scholar
  3. Aitchison J (1982) The statistical analysis of compositional data. J R Stat Soc Ser B (Methodol) 44(2):139–77MathSciNetzbMATHGoogle Scholar
  4. Appice A, Ciampi A, Malerba D (2015) Summarizing numeric spatial data streams by trend cluster discovery. Data Min Knowl Discov 29(1):84–136MathSciNetCrossRefzbMATHGoogle Scholar
  5. Arroyo J, Maté C (2009) Forecasting histogram time series with k-nearest neighbours methods. Int J Forecast. Google Scholar
  6. Balzanella A, Rivoli L, Verde R (2013) Data stream summarization by histograms clustering. In: Giudici P, Ingrassia S, Vichi M (eds) Statistical models for data analysis. Springer, Berlin, pp 27–35CrossRefGoogle Scholar
  7. Balzanella A, Romano E, Verde R (2017) Modified half-region depth for spatially dependent functional data. Stoch Environ Res Risk Assess 31:87. CrossRefzbMATHGoogle Scholar
  8. Barnes RJ, Johnson TB (1984) Positive kriging. Verley G, David M, Journal AG, Marechal A(eds) Geostatistics for natural resources characterization. Springer, Berlin, pp 231–244CrossRefGoogle Scholar
  9. Bigot J, Gouet R, Klein T, López A (2017) Geodesic PCA in the Wasserstein space by convex PCA. Ann Inst Henri Poincare Probab Stat 53(1):1–26MathSciNetCrossRefzbMATHGoogle Scholar
  10. Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98(462):470–487MathSciNetCrossRefGoogle Scholar
  11. Bock HH, Diday E (2000) Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer, BerlinCrossRefzbMATHGoogle Scholar
  12. Boissard E, Le Gouic T, Loubes JM (2015) Distribution’s template estimate with Wasserstein metrics. Bernoulli 21(2):740–759. MathSciNetCrossRefzbMATHGoogle Scholar
  13. Boogaart KG, Egozcue JJ, Pawlowsky-Glahn V (2014) Bayes Hilbert spaces. Aust N Z J Stat 56(2):171–194MathSciNetCrossRefzbMATHGoogle Scholar
  14. Brito P (2014) Symbolic data analysis: another look at the interaction of data mining and statistics. WIREs Data Min Knowl Discov 4(4):281–295CrossRefGoogle Scholar
  15. Caballero W, Giraldo R, Mateu J (2013) A universal kriging approach for spatial functional data. Stoch Environ Res Risk Assess 27:1553–1563CrossRefGoogle Scholar
  16. Chiles JP, Delfiner P (2012) Geostatististics, modelling spatial uncertainty, 2nd edn. Wiley-Interscience, New YorkCrossRefzbMATHGoogle Scholar
  17. Cressie N (1993) Statistics for spatial data. Wiley, HobokenzbMATHGoogle Scholar
  18. Cressie N, Wikle CK (2011) Statistics for spatio-temporal data. Wiley, New YorkzbMATHGoogle Scholar
  19. Cuturi M, Doucet A (2014) Fast computation of Wasserstein barycenters. In: Proceedings of the 31st international conference on machine learning, PMLR, vol 32(2), pp 685–693Google Scholar
  20. Del Barrio E, Cuesta-Albertos JA, Matrán C, Mayo-Íscar A (2018) Robust clustering tools based on optimal transportation. Stat Comput. Google Scholar
  21. Delicado P, Giraldo R, Comas C, Mateu J (2010) Statistics for spatial functional data: some recent contributions. Environmetrics 21(3–4):224–239MathSciNetGoogle Scholar
  22. Dias S, Brito P (2013) Linear regression model with histogram-valued variables. Stat Anal Data Min 8(2):75–113MathSciNetCrossRefGoogle Scholar
  23. Ding Q, Ding Q, Perrizo W (2002) Decision tree classification of spatial data streams using Peano count trees. In: Proceedings of the 2002 ACM symposium on applied computing. (SAC’02). ACM, New York, NY, USA, 413–417.
  24. Ganguly AR, Gama J, Omitaomu OA, Gaber M, Vatsavai RR (2008) Knowledge discovery from sensor data. CRC Press, Boca RatonCrossRefzbMATHGoogle Scholar
  25. Giraldo R, Delicado P, Mateu J (2011) Ordinary kriging for function-valued spatial data. Environ Ecol Stat 18(3):411–426MathSciNetCrossRefGoogle Scholar
  26. González-Rivera G, Arroyo J (2012) Time series modeling of histogram-valued data: the daily histogram time series of S&P500 intradaily returns. Int J Forecast 28(1):20–33CrossRefGoogle Scholar
  27. Gouet R, López A, Ortiz JM (2015) Geodesic kriging in the Wasserstein space. In: Schaeben H, Tolosana-Delgado R, van den Boogaart KG, van den Boogaart R (eds) Proceedings of the 17th annual Conference of the international association for mathematical geosciences IAMG 2015Google Scholar
  28. Ignaccolo R, Mateu J, Giraldo R (2014) Kriging with external drift for functional data for air quality monitoring. Stoch Environ Res Risk Assess 28:1171–1186. CrossRefGoogle Scholar
  29. Irpino A, Romano E (2007) Optimal histogram representation of large data sets: Fisher vs piecewise linear approximation. In: Noirhomme-Fraiture M, Venturini G (eds) EGC, Revue des Nouvelles Technologies de lInformation, vol RNTI-E-9, pp 99–110Google Scholar
  30. Irpino A, Verde R (2006) A new Wasserstein based distance for the hierarchical clustering of histogram symbolic data. In: Batagelj V, Bock HH, Ferligoj A, Žiberna A (eds) Data science and classification, proceedings of the IFCS 2006. Springer, Berlin, pp 185-192Google Scholar
  31. Irpino A, Verde R (2015a) Basic statistics for distributional symbolic variables: a new metric-based approach. Adv Data Anal Classif 9(2):143–175MathSciNetCrossRefGoogle Scholar
  32. Irpino A, Verde R (2015b) Regression for numeric symbolic variables: a least squares approach based on Wasserstein distance. Adv Data Anal Classif 9:81–106 ISSN: 1862-5347Google Scholar
  33. Journel AG, Huijbregts CJ (2004) Mining geostatistics. The Blackburn Press, CaldwellGoogle Scholar
  34. Matheron G (1963) Principles of geostatistics. Econ Geol 58(8):1246CrossRefGoogle Scholar
  35. Menafoglio A, Petris G (2016) Kriging for Hilbert-space valued random fields: the operatorial point of view. J Multivar Anal 146(2016):84–94MathSciNetCrossRefzbMATHGoogle Scholar
  36. Menafoglio A, Secchi P (2017) Statistical analysis of complex and spatially dependent data: a review of Object Oriented Spatial Statistics. Eur J Oper Res 258(2):401–410MathSciNetCrossRefzbMATHGoogle Scholar
  37. Menafoglio A, Secchi P, Dalla Rosa M (2013) A universal kriging predictor for spatially dependent functional data of a Hilbert space. Electron J Stat 7:2209–2240MathSciNetCrossRefzbMATHGoogle Scholar
  38. Menafoglio A, Guadagnini A, Secchi P (2014) A kriging approach based on Aitchison geometry for the characterization of particle-size curves in heterogeneous aquifers. Stoch Environ Res Risk Assess 28:183–1851CrossRefGoogle Scholar
  39. Montero JM, Fernandez-Aviles G, Mateu J (2015) An introduction to functional geostatistics. In: Montero J, Fernández-Avilés G, Mateu J (eds) Spatial and spatio-temporal geostatistical modeling and kriging. Wiley, New York, pp 274–294CrossRefGoogle Scholar
  40. Panaretos VM, Zemel Y (2016) Amplitude and phase variation of point processes. Ann Stat 44(2):771–812MathSciNetCrossRefzbMATHGoogle Scholar
  41. Pigoli D, Menafoglio A, Secchi P (2016) Kriging prediction for manifold valued random field. J Multivar Anal 145:117–131MathSciNetCrossRefzbMATHGoogle Scholar
  42. Ramirez D, Via J, Santamaria I, Scharf LL (2010) Detection of spatially correlated Gaussian time series. IEEE Trans Signal Process 58(10):5006–5015MathSciNetCrossRefzbMATHGoogle Scholar
  43. Rubner Y, Tomasi C, Guibas LJ (2000) The Earth mover’s distance as a metric for image retrieval. Int J Comput Vis 40:99–121CrossRefzbMATHGoogle Scholar
  44. Rushendorff L (2001) Wasserstein metric. In: Encyclopedia of mathematics. Springer, BerlinGoogle Scholar
  45. Terrell GR, Scott DW (1985) Oversmoothed nonparametric density estimates. J Am Stat Assoc 80:209–214MathSciNetCrossRefGoogle Scholar
  46. Tobler W (1970) A computer movie simulating urban growth in the Detroit region. Econ Geogr 46(2):234–240CrossRefGoogle Scholar
  47. Verde R, Irpino A (2007) Dynamic clustering of histogram data: using the right metric. In: Brito P, Cucumel G, Bertrand P, de Carvalho F (eds) Selected contributions in data analysis and classification. Springer, Berlin, pp 123–134CrossRefGoogle Scholar
  48. Villani C (2003) Topics in optimal transportation. Graduate Studies in Mathematics, vol 58. American Mathematical Society, ProvidenceGoogle Scholar
  49. Wackernagel H (2003) Multivariate geostatistics. Springer, BerlinCrossRefzbMATHGoogle Scholar
  50. Wei LY, Peng WC (2013) An incremental algorithm for clustering spatial data streams: exploring temporal locality. Knowl Inf Syst 37(2):453–483CrossRefGoogle Scholar
  51. Zemel Y, Panaretos VM (2019) Fréchet means and procrustes analysis in Wasserstein space. Bernoulli 25(2):932–976.
  52. Zhang P, Huang Y, Shekhar S, Kumar V (2003a) Correlation analysis of spatial time series datasets: a filter-and-refine approach. In: Proceedings of the 7th Pacific-Asia conference on knowledge discovery and data miningGoogle Scholar
  53. Zhang P, Huang Y, Shekhar S, Kumar V, (2003b) Exploiting spatial autocorrelation to efficiently process correlation-based similarity queries. In: Hadzilacos T, Manolopoulos Y, Roddick J, Theodoridis Y (eds) Advances in spatial and temporal databases. SSTD, (2003) Lecture Notes in Computer Science, vol 2750. Springer, BerlinGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Mathematics and PhysicsUniversità della Campania Luigi VanvitelliCasertaItaly

Personalised recommendations