Skip to main content

RkNN Query Processing in Distributed Spatial Infrastructures: A Performance Study

  • Conference paper
  • First Online:
Model and Data Engineering (MEDI 2017)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10563))

Included in the following conference series:

Abstract

The Reverse k-Nearest Neighbor (RkNN) problem, i.e. finding all objects in a dataset that have a given query point among their corresponding k-nearest neighbors, has received increasing attention in the past years. RkNN queries are of particular interest in a wide range of applications such as decision support systems, resource allocation, profile-based marketing, location-based services, etc. With the current increasing volume of spatial data, it is difficult to perform RkNN queries efficiently in spatial data-intensive applications, because of the limited computational capability and storage resources. In this paper, we investigate how to design and implement distributed RkNN query algorithms using shared-nothing spatial cloud infrastructures as SpatialHadoop and LocationSpark. SpatialHadoop is a framework that inherently supports spatial indexing on top of Hadoop to perform efficiently spatial queries. LocationSpark is a recent spatial data processing system built on top of Spark. We have evaluated the performance of the distributed RkNN query algorithms on both SpatialHadoop and LocationSpark with big real-world datasets. The experiments have demonstrated the efficiency and scalability of our proposal in both distributed spatial data management systems, showing the performance advantages of LocationSpark.

F. García-García, A. Corral, L. Iribarne and M. Vassilakopoulos—Work funded by the MINECO research project [TIN2013-41576-R].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available at http://spatialhadoop.cs.umn.edu/datasets.html.

  2. 2.

    Available at https://github.com/aseldawy/spatialhadoop2.

  3. 3.

    Available at https://github.com/merlintang/SpatialSpark.

References

  1. Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.H.: Hadoop-GIS: a high performance spatial data warehousing system over MapReduce. PVLDB 6(11), 1009–1020 (2013)

    Google Scholar 

  2. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI Conference, pp. 137–150 (2004)

    Google Scholar 

  3. Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in SpatialHadoop. PVLDB 8(12), 1602–1613 (2015)

    Google Scholar 

  4. Eldawy, A., Mokbel, M.F.: SpatialHadoop: a MapReduce framework for spatial data. In: ICDE Conference, pp. 1352–1363 (2015)

    Google Scholar 

  5. García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M., Manolopoulos, Y.: Enhancing SpatialHadoop with closest pair queries. In: Pokorný, J., Ivanović, M., Thalheim, B., Šaloun, P. (eds.) ADBIS 2016. LNCS, vol. 9809, pp. 212–225. Springer, Cham (2016). doi:10.1007/978-3-319-44039-2_15

    Chapter  Google Scholar 

  6. Ji, C., Hu, H., Xu, Y., Li, Y., Qu, W.: Efficient multi-dimensional spatial RkNN query processing with MapReduce. In: ChinaGrid Conference, pp. 63–68 (2013)

    Google Scholar 

  7. Ji, C., Qu, W., Li, Z., Xu, Y., Li, Y., Wu, J.: Scalable multi-dimensional RNN query processing. Concurr. Comput.: Pract. Exp. 27(16), 4156–4171 (2015)

    Article  Google Scholar 

  8. Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. In: SIGMOD Conference, pp. 201–212 (2000)

    Google Scholar 

  9. Li, F., Ooi, B.C., Özsu, M.T., Wu, S.: Distributed data management using MapReduce. ACM Comput. Surv. 46(3), 1–42 (2014)

    Google Scholar 

  10. Singh, A., Ferhatosmanoglu, H., Tosun, H.S.: High dimensional reverse nearest neighbor queries. In: CIKM Conference, pp. 91–98 (2003)

    Google Scholar 

  11. Stanoi, I., Agrawal, D., El Abbadi, A.: Reverse nearest neighbor queries for dynamic databases, pp. 44–53. In: SIGMOD Workshop on Research Issues, Data Mining and Knowledge Discovery (2000)

    Google Scholar 

  12. Tang, M., Yu, Y., Malluhi, Q.M., Ouzzani, M., Aref, W.G.: LocationSpark: a distributed in-memory data management system for big spatial data. PVLDB 9(13), 1565–1568 (2016)

    Google Scholar 

  13. Tao, Y., Papadias, D., Lian, X.: Reverse kNN search in arbitrary dimensionality. In: VLBD Conference, pp. 744–755 (2004)

    Google Scholar 

  14. Wu, W., Yang, F., Chan, C.Y., Tan, K.L.: FINCH: evaluating reverse k-Nearest-Neighbor queries on location data. PVLDB 1(1), 1056–1067 (2008)

    Google Scholar 

  15. Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In: SIGMOD Conference, pp. 1071–1085 (2016)

    Google Scholar 

  16. Yang, S., Cheema, M.A., Lin, X., Wang, W.: Reverse k nearest neighbors query processing: experiments and analysis. PVLDB 8(5), 605–616 (2015)

    Google Scholar 

  17. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI Conference, pp. 15–28 (2012)

    Google Scholar 

  18. Zhang, H., Chen, G., Ooi, B.C., Tan, K.-L., Zhang, M.: In-memory big data management and processing: a survey. TKDE 27(7), 1920–1948 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Corral .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M. (2017). RkNN Query Processing in Distributed Spatial Infrastructures: A Performance Study. In: Ouhammou, Y., Ivanovic, M., Abelló, A., Bellatreche, L. (eds) Model and Data Engineering. MEDI 2017. Lecture Notes in Computer Science(), vol 10563. Springer, Cham. https://doi.org/10.1007/978-3-319-66854-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66854-3_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66853-6

  • Online ISBN: 978-3-319-66854-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics