RkNN Query Processing in Distributed Spatial Infrastructures: A Performance Study

García-García, Francisco; Corral, Antonio; Iribarne, Luis; Vassilakopoulos, Michael

doi:10.1007/978-3-319-66854-3_15

Francisco García-García¹⁷,
Antonio Corral¹⁷,
Luis Iribarne¹⁷ &
…
Michael Vassilakopoulos¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10563))

Included in the following conference series:

International Conference on Model and Data Engineering

795 Accesses
3 Citations

Abstract

The Reverse k-Nearest Neighbor (RkNN) problem, i.e. finding all objects in a dataset that have a given query point among their corresponding k-nearest neighbors, has received increasing attention in the past years. RkNN queries are of particular interest in a wide range of applications such as decision support systems, resource allocation, profile-based marketing, location-based services, etc. With the current increasing volume of spatial data, it is difficult to perform RkNN queries efficiently in spatial data-intensive applications, because of the limited computational capability and storage resources. In this paper, we investigate how to design and implement distributed RkNN query algorithms using shared-nothing spatial cloud infrastructures as SpatialHadoop and LocationSpark. SpatialHadoop is a framework that inherently supports spatial indexing on top of Hadoop to perform efficiently spatial queries. LocationSpark is a recent spatial data processing system built on top of Spark. We have evaluated the performance of the distributed RkNN query algorithms on both SpatialHadoop and LocationSpark with big real-world datasets. The experiments have demonstrated the efficiency and scalability of our proposal in both distributed spatial data management systems, showing the performance advantages of LocationSpark.

F. García-García, A. Corral, L. Iribarne and M. Vassilakopoulos—Work funded by the MINECO research project [TIN2013-41576-R].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Available at http://spatialhadoop.cs.umn.edu/datasets.html.
2.
Available at https://github.com/aseldawy/spatialhadoop2.
3.
Available at https://github.com/merlintang/SpatialSpark.

References

Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.H.: Hadoop-GIS: a high performance spatial data warehousing system over MapReduce. PVLDB 6(11), 1009–1020 (2013)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI Conference, pp. 137–150 (2004)
Google Scholar
Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in SpatialHadoop. PVLDB 8(12), 1602–1613 (2015)
Google Scholar
Eldawy, A., Mokbel, M.F.: SpatialHadoop: a MapReduce framework for spatial data. In: ICDE Conference, pp. 1352–1363 (2015)
Google Scholar
García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M., Manolopoulos, Y.: Enhancing SpatialHadoop with closest pair queries. In: Pokorný, J., Ivanović, M., Thalheim, B., Šaloun, P. (eds.) ADBIS 2016. LNCS, vol. 9809, pp. 212–225. Springer, Cham (2016). doi:10.1007/978-3-319-44039-2_15
Chapter Google Scholar
Ji, C., Hu, H., Xu, Y., Li, Y., Qu, W.: Efficient multi-dimensional spatial RkNN query processing with MapReduce. In: ChinaGrid Conference, pp. 63–68 (2013)
Google Scholar
Ji, C., Qu, W., Li, Z., Xu, Y., Li, Y., Wu, J.: Scalable multi-dimensional RNN query processing. Concurr. Comput.: Pract. Exp. 27(16), 4156–4171 (2015)
Article Google Scholar
Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. In: SIGMOD Conference, pp. 201–212 (2000)
Google Scholar
Li, F., Ooi, B.C., Özsu, M.T., Wu, S.: Distributed data management using MapReduce. ACM Comput. Surv. 46(3), 1–42 (2014)
Google Scholar
Singh, A., Ferhatosmanoglu, H., Tosun, H.S.: High dimensional reverse nearest neighbor queries. In: CIKM Conference, pp. 91–98 (2003)
Google Scholar
Stanoi, I., Agrawal, D., El Abbadi, A.: Reverse nearest neighbor queries for dynamic databases, pp. 44–53. In: SIGMOD Workshop on Research Issues, Data Mining and Knowledge Discovery (2000)
Google Scholar
Tang, M., Yu, Y., Malluhi, Q.M., Ouzzani, M., Aref, W.G.: LocationSpark: a distributed in-memory data management system for big spatial data. PVLDB 9(13), 1565–1568 (2016)
Google Scholar
Tao, Y., Papadias, D., Lian, X.: Reverse kNN search in arbitrary dimensionality. In: VLBD Conference, pp. 744–755 (2004)
Google Scholar
Wu, W., Yang, F., Chan, C.Y., Tan, K.L.: FINCH: evaluating reverse k-Nearest-Neighbor queries on location data. PVLDB 1(1), 1056–1067 (2008)
Google Scholar
Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In: SIGMOD Conference, pp. 1071–1085 (2016)
Google Scholar
Yang, S., Cheema, M.A., Lin, X., Wang, W.: Reverse k nearest neighbors query processing: experiments and analysis. PVLDB 8(5), 605–616 (2015)
Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI Conference, pp. 15–28 (2012)
Google Scholar
Zhang, H., Chen, G., Ooi, B.C., Tan, K.-L., Zhang, M.: In-memory big data management and processing: a survey. TKDE 27(7), 1920–1948 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, University of Almeria, Almeria, Spain
Francisco García-García, Antonio Corral & Luis Iribarne
Department of Electrical and Computer Engineering, University of Thessaly, Volos, Greece
Michael Vassilakopoulos

Authors

Francisco García-García
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Corral
View author publications
You can also search for this author in PubMed Google Scholar
Luis Iribarne
View author publications
You can also search for this author in PubMed Google Scholar
Michael Vassilakopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonio Corral .

Editor information

Editors and Affiliations

ISAE-ENSMA, Chasseneuil, France
Yassine Ouhammou
University of Novi Sad, Novi Sad, Serbia
Mirjana Ivanovic
UPC-Barcelona Tech, Barcelona, Spain
Alberto Abelló
ISAE-ENSMA, Chasseneuil, France
Ladjel Bellatreche

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M. (2017). RkNN Query Processing in Distributed Spatial Infrastructures: A Performance Study. In: Ouhammou, Y., Ivanovic, M., Abelló, A., Bellatreche, L. (eds) Model and Data Engineering. MEDI 2017. Lecture Notes in Computer Science(), vol 10563. Springer, Cham. https://doi.org/10.1007/978-3-319-66854-3_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-66854-3_15
Published: 06 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66853-6
Online ISBN: 978-3-319-66854-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics