, Volume 16, Issue 2, pp 329–355 | Cite as

An interactive framework for spatial joins: a statistical approach to data analysis in GIS

  • Shayma Alkobaisi
  • Wan D. BaeEmail author
  • Petr Vojtěchovský
  • Sada Narayanappa


Many Geographic Information Systems (GIS) handle a large volume of geospatial data. Spatial joins over two or more geospatial datasets are very common operations in GIS for data analysis and decision support. However, evaluating spatial joins can be very time intensive due to the size of datasets. In this paper, we propose an interactive framework that provides faster approximate answers of spatial joins. The proposed framework utilizes two statistical methods: probabilistic join and sampling based join. The probabilistic join method provides speedup of two orders of magnitude with no correctness guarantee, while the sampling based method provides an order of magnitude improvement over the full indexing tree joins of datasets and also provides running confidence intervals. The framework allows users to trade-off speed versus bounded accuracy, hence it provides truly interactive data exploration. The two methods are evaluated empirically with real and synthetic datasets.


Interactive queries Spatial join Join probability Probabilistic joins Incremental sampling Quad-tree R-tree GIS 


  1. 1.
    An N, Yang Z, Sivasubramaniam A (2001) Selectivity estimation for spatial joins. In: Proceedings of international conf on data engineering (ICDE), pp 368–375Google Scholar
  2. 2.
    Azevedo LG, Güting RH, Rodrigues RB, Zimbrão G, de Souza JM (2006) Filtering with raster signatures. In: Proceedings of ACM GIS, pp 187–194Google Scholar
  3. 3.
    Bae WD, Alkobaisi S, Leutenegger ST (2006) An incremental refinining spatial join algorithm for estimating query results in GIS. In: Proceedings of international conf. on database and expert systems applications (DEXA), pp 935–944Google Scholar
  4. 4.
    Bae WD, Alkobaisi S, Leutenegger ST (2009) IRSJ: incremental refining spatial joins for interactive queries in GIS. Geoinformatica 14(4):507–543CrossRefGoogle Scholar
  5. 5.
    Bae WD, Vojtěchovský P, Alkobaisi S, Leutenegger ST, Kim SH (2010) An interactive framework for raster data spatial joins. In: Proceedings of ACM international symposium on advances in geographic information systems, pp 19–26Google Scholar
  6. 6.
    Brinkhoff T, Kriegel H, Seeger B (1993) Efficient processing of spatial joins using R-trees. In: Proceedings of ACM SIGMOD, pp 127–246Google Scholar
  7. 7.
    Brinkhoff T, Kriegel HP, Schneider R (1993) Comparison of approximations of complex objects used for approximation-based query processing in spatial database systems. In: Proceedings of international conf. on data engineering (ICDE), pp 40–49Google Scholar
  8. 8.
    Cheng R, Kalashnikov D, Prabhakar S (2003) Evaluating probabilistic queries over imprecise data. In: Proceedings of ACM SIGMOD, pp 551–562Google Scholar
  9. 9.
    Cheng R, Xia Y, Prabhakar S, Shah R, Vitter J (2006) Efficient join processing over uncertain data. In: Proceedings of ACM CIKM, pp 738–747Google Scholar
  10. 10.
    Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of ACM SIGMOD, pp 45–57Google Scholar
  11. 11.
    Haas PJ (1997) Large-sample and deterministic confidence intervals for online aggregation. In: Proceedings of international conf scientific and statistical databases management (SSDBM), pp 51–63Google Scholar
  12. 12.
    Haas PJ, Hellerstein JM (1999) Ripple joins for online aggregation. In: Proceedings of ACM SIGMOD, pp 287–298Google Scholar
  13. 13.
    Hellerstein JM, Hass PJ, Wang HJ (1997) Online aggregation. In: Proceedings of ACM SIGMOD, pp 171–182Google Scholar
  14. 14.
    Hellerstein JM, Avnur R, Raman V (2000) Informix under control: online query processing. Data Mining and Knowledge Discovery 12:281–314CrossRefGoogle Scholar
  15. 15.
    Larson RR (1996) Geographic information retrieval and spatial browsing. GIS and Libraries, University of IllinoisGoogle Scholar
  16. 16.
    Luo G, Naughton JF, Ellmann CJ (2002) A non-blocking parallel spatial join algorithm. In: Proceedings of international conference on data engineering, pp 697–705Google Scholar
  17. 17.
    Medeiros CB, Pires F (1994) Databases for GIS. ACM SIDMOD Rec 23(1):107–115CrossRefGoogle Scholar
  18. 18.
    Olken F (1993) Random sampling from databases. PhD thesis, University of California at BerkeleyGoogle Scholar
  19. 19.
    Papadias D, Mamoulis N, Theodoridis Y (1999) Processing and optimization of multiway spatial joins using R-trees. In: Proceedings of ACM PODS, pp 44–55Google Scholar
  20. 20.
    Samet H (1990) The design and analysis of spatial data structures. Addison-Wesley, ReadingGoogle Scholar
  21. 21.
    Serfling RJ (2002) Basic statistics for business and economics. McGraw-Hill, New YorkGoogle Scholar
  22. 22.
    Tveite H (1997) Data modeling and database requirements for geographical data. PhD thesis, Norwegian University of Science and Technology, NorwayGoogle Scholar
  23. 23.
    US Geological Survey (2001, 2005) USGS: mineral resources on-line spatial data. URL
  24. 24.
    Vassilakopoulos M, Manolopoulos Y (1997) On sampling regional data. Data Knowl Eng 22:309–318CrossRefGoogle Scholar
  25. 25.
    Zimbrão G, de Souza JM (1998) A raster approximation for the processing of spatial joins. In: Proceedings of VLDB, pp 558–569Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Shayma Alkobaisi
    • 1
  • Wan D. Bae
    • 2
    Email author
  • Petr Vojtěchovský
    • 3
  • Sada Narayanappa
    • 4
  1. 1.Faculty of Information TechnologyUnited Arab Emirates UniversityAl AinUnited Arab Emirates
  2. 2.Department of Mathematics, Statistics and Computer ScienceUniversity of Wisconsin-StoutMenomonieUSA
  3. 3.Department of MathematicsUniversity of DenverDenverUSA
  4. 4.Advanced Computing TechnologyJeppesen, IncEnglewoodUSA

Personalised recommendations