Skip to main content

A Comparison of Distributed Spatial Data Management Systems for Processing Distance Join Queries

  • Conference paper
  • First Online:
Advances in Databases and Information Systems (ADBIS 2017)

Abstract

Due to the ubiquitous use of spatial data applications and the large amounts of spatial data that these applications generate, the processing of large-scale distance joins in distributed systems is becoming increasingly popular. Two of the most studied distance join queries are the K Closest Pair Query (KCPQ) and the \(\varepsilon \) Distance Join Query (\(\varepsilon \) DJQ). The KCPQ finds the K closest pairs of points from two datasets and the \(\varepsilon \) DJQ finds all the possible pairs of points from two datasets, that are within a distance threshold \(\varepsilon \) of each other. Distributed cluster-based computing systems can be classified in Hadoop-based and Spark-based systems. Based on this classification, in this paper, we compare two of the most current and leading distributed spatial data management systems, namely SpatialHadoop and LocationSpark, by evaluating the performance of existing and newly proposed parallel and distributed distance join query algorithms in different situations with big real-world datasets. As a general conclusion, while SpatialHadoop is more mature and robust system, LocationSpark is the winner with respect to the total execution time.

F. García-García, A. Corral, L. Iribarne and M. Vassilakopoulos — Work funded by the MINECO research project [TIN2013-41576-R].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available at https://hadoop.apache.org/.

  2. 2.

    Available at https://spark.apache.org/.

  3. 3.

    Available at http://spatialhadoop.cs.umn.edu/datasets.html.

  4. 4.

    Available at https://github.com/aseldawy/spatialhadoop2.

  5. 5.

    Available at https://github.com/merlintang/SpatialSpark.

References

  1. Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.H.: Hadoop-GIS: a high performance spatial data warehousing system over MapReduce. PVLDB 6(11), 1009–1020 (2013)

    Google Scholar 

  2. Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Algorithms for processing \(K\)-closest-pair queries in spatial databases. Data Knowl. Eng. 49(1), 67–104 (2004)

    Article  Google Scholar 

  3. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI Conference, pp. 137–150 (2004)

    Google Scholar 

  4. Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in SpatialHadoop. PVLDB 8(12), 1602–1613 (2015)

    Google Scholar 

  5. Eldawy, A., Mokbel, M.F.: SpatialHadoop: a MapReduce framework for spatial data. In: ICDE Conference, pp. 1352–1363 (2015)

    Google Scholar 

  6. García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M., Manolopoulos, Y.: Enhancing SpatialHadoop with closest pair queries. In: Pokorný, J., Ivanović, M., Thalheim, B., Šaloun, P. (eds.) ADBIS 2016. LNCS, vol. 9809, pp. 212–225. Springer, Cham (2016). doi:10.1007/978-3-319-44039-2_15

    Chapter  Google Scholar 

  7. Lenka, R.K., Barik, R.K., Gupta, N., Ali, S.M., Rath, A., Dubey, H.: Comparative analysis of SpatialHadoop and GeoSpark for geospatial big data analytics, CoRR abs/1612.07433 (2016)

    Google Scholar 

  8. Li, F., Ooi, B.C., Özsu, M.T., Wu, S.: Distributed data management using MapReduce. ACM Comput. Surv. 46(3), 31:1–31:42 (2014)

    Google Scholar 

  9. Roumelis, G., Corral, A., Vassilakopoulos, M., Manolopoulos, Y.: New plane-sweep algorithms for distance-based join queries in spatial databases. GeoInformatica 20(4), 571–628 (2016)

    Article  Google Scholar 

  10. Shi, J., Qiu, Y., Minhas, U.F., Jiao, L., Wang, C., Reinwald, B., Özcan, F.: Clash of the titans: mapreduce vs. spark for large scale data analytics. PVLDB 8(13), 2110–2121 (2015)

    Google Scholar 

  11. Tang, M., Yu, Y., Malluhi, Q.M., Ouzzani, M., Aref, W.G.: Locationspark: a distributed in-memory data management system for big spatial data. PVLDB 9(13), 1565–1568 (2016)

    Google Scholar 

  12. Tang, M., Yu, Y., Aref, W.G., Mahmood, A.R., Malluhi, Q.M., Ouzzani, M.: In-memory distributed spatial query processing and optimization, April 2017. http://merlintang.github.io/paper/memory-distributed-spatial.pdf

  13. Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In: SIGMOD Conference, pp. 1071–1085 (2016)

    Google Scholar 

  14. You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. In: ICDE Workshops, pp. 34–41 (2015)

    Google Scholar 

  15. You, S., Zhang, J., Gruenwald, L.: Spatial join query processing in cloud: Analyzing design choices and performance comparisons. In: ICPPW Conference, pp. 90–97 (2015)

    Google Scholar 

  16. Yu, J., Wu, J., Sarwat, M.: GeoSpark: a cluster computing framework for processing large-scale spatial data. In: SIGSPATIAL Conference, pp. 70:1–70:4 (2015)

    Google Scholar 

  17. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI Conference, pp. 15–28 (2012)

    Google Scholar 

  18. Zhang, H., Chen, G., Ooi, B.C., Tan, K.-L., Zhang, M.: In-memory big data management and processing: a survey. TKDE 27(7), 1920–1948 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Corral .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

García-García, F., Corral, A., Iribarne, L., Mavrommatis, G., Vassilakopoulos, M. (2017). A Comparison of Distributed Spatial Data Management Systems for Processing Distance Join Queries. In: Kirikova, M., Nørvåg, K., Papadopoulos, G. (eds) Advances in Databases and Information Systems. ADBIS 2017. Lecture Notes in Computer Science(), vol 10509. Springer, Cham. https://doi.org/10.1007/978-3-319-66917-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66917-5_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66916-8

  • Online ISBN: 978-3-319-66917-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics