Efficient Multidimensional AkNN Query Processing in the Cloud

Nodarakis, Nikolaos; Pitoura, Evaggelia; Sioutas, Spyros; Tsakalidis, Athanasios; Tsoumakos, Dimitrios; Tzimas, Giannis

doi:10.1007/978-3-319-10073-9_41

Nikolaos Nodarakis²⁰,
Evaggelia Pitoura²¹,
Spyros Sioutas²²,
Athanasios Tsakalidis²⁰,
Dimitrios Tsoumakos²² &
…
Giannis Tzimas²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8644))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1206 Accesses
3 Citations

Abstract

A k-nearest neighbor (kNN) query determines the k nearest points, using distance metrics, from a given location. An all k-nearest neighbor (AkNN) query constitutes a variation of a kNN query and retrieves the k nearest points for each point inside a database. Their main usage resonates in spatial databases and they consist the backbone of many location-based applications and not only. In this work, we propose a novel method for classifying multidimensional data using an AkNN algorithm in the MapReduce framework. Our approach exploits space decomposition techniques for processing the classification procedure in a parallel and distributed manner. To our knowledge, we are the first to study the kNN classification of multidimensional objects under this perspective. Through an extensive experimental evaluation we prove that our solution is efficient, robust and scalable in processing the given queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Afrati, F.N., Ullman, J.D.: Optimizing Joins in a Map-Reduce Environment. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 99–110. ACM, New York (2010)
Chapter Google Scholar
Böhm, C., Krebs, F.: The k-Nearest Neighbour Join: Turbo Charging the KDD Process. Knowl. Inf. Syst. 6, 728–749 (2004)
Article Google Scholar
Chang, J., Luo, J., Huang, J.Z., Feng, S., Fan, J.: Minimum Spanning Tree Based Classification Model for Massive Data with MapReduce Implementation. In: Proceedings of the 10th IEEE International Conference on Data Mining Workshop, pp. 129–137. IEEE Computer Society, Washington, DC (2010)
Google Scholar
Chen, Y., Patel, J.M.: Efficient Evaluation of All-Nearest-Neighbor Queries. In: Proceedings of the 23rd IEEE International Conference on Data Engineering, pp. 1056–1065. IEEE Computer Society, Washington, DC (2007)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: Proceedings of the 6th Symposium on Operating Systems Design and Implementation, pp. 137–150. USENIX Association, Berkeley (2004)
Google Scholar
Dunham, M.H.: Data Mining, Introductory and Advanced Topics. Prentice Hall, Upper Saddle River (2002)
Google Scholar
Emrich, T., Graf, F., Kriegel, H.-P., Schubert, M., Thoma, M.: Optimizing All-Nearest-Neighbor Queries with Trigonometric Pruning. In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 501–518. Springer, Heidelberg (2010)
Chapter Google Scholar
Gkoulalas-Divanis, A., Verykios, V.S., Bozanis, P.: A Network Aware Privacy Model for Online Requests in Trajectory Data. Data Knowl. Eng. 68, 431–452 (2009)
Article Google Scholar
He, Q., Zhuang, F., Li, J., Shi, Z.: Parallel implementation of classification algorithms based on MapReduce. In: Yu, J., Greco, S., Lingras, P., Wang, G., Skowron, A. (eds.) RSKT 2010. LNCS, vol. 6401, pp. 655–662. Springer, Heidelberg (2010)
Chapter Google Scholar
Lu, W., Shen, Y., Chen, S., Ooi, B.C.: Efficient Processing of k Nearest Neighbor Joins using MapReduce. Proc. VLDB Endow. 5, 1016–1027 (2012)
Article Google Scholar
Roussopoulos, N., Kelley, S., Vincent, F.: Nearest Neighbor Queries. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pp. 71–79. ACM, New York (1995)
Chapter Google Scholar
Samet, H.: The QuadTree and Related Hierarchical Data Structures. ACM Comput. Surv. 16, 187–260 (1984)
Article MathSciNet Google Scholar
Stupar, A., Michel, S., Schenkel, R.: RankReduce - Processing K-Nearest Neighbor Queries on Top of MapReduce. In: Proceedings of the 8th Workshop on Large-Scale Distributed Systems for Information Retrieval, pp. 13–18 (2010)
Google Scholar
The apache software foundation: Hadoop homepage, http://hadoop.apache.org/
Vernica, R., Carey, M.J., Li, C.: Efficient Parallel Set-Similarity Joins Using MapReduce. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 495–506. ACM, New York (2010)
Google Scholar
White, T.: Hadoop: The Definitive Guide, 3rd edn. O’Reilly Media / Yahoo Press (2012)
Google Scholar
Xia, C., Lu, H., Chin, B., Hu, O.J.: Gorder: An efficient method for knn join processing. In: VLDB, pp. 756–767. VLDB Endowment (2004)
Google Scholar
Yao, B., Li, F., Kumar, P.: K Nearest Neighbor Queries and KNN-Joins in Large Relational Databases (Almost) for Free. In: Proceedings of the 26th International Conference on Data Engineering, pp. 4–15. IEEE Computer Society, Washington, DC (2010)
Google Scholar
Yokoyama, T., Ishikawa, Y., Suzuki, Y.: Processing All k-Nearest Neighbor Queries in Hadoop. In: Gao, H., Lim, L., Wang, W., Li, C., Chen, L. (eds.) WAIM 2012. LNCS, vol. 7418, pp. 346–351. Springer, Heidelberg (2012)
Chapter Google Scholar
Yu, C., Cui, B., Wang, S., Su, J.: Efficient index-based KNN join processing for high-dimensional data. Information & Software Technology 49, 332–344 (2007)
Article Google Scholar
Zhang, C., Li, F., Jestes, J.: Efficient Parallel kNN Joins for Large Data in MapReduce. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 38–49. ACM, New York (2012)
Chapter Google Scholar
Zhang, J., Mamoulis, N., Papadias, D., Tao, Y.: All-Nearest-Neighbors Queries in Spatial Databases. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management, pp. 297–306. IEEE Computer Society, Washington (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering and Informatics Department, University of Patras, 26500, Patras, Greece
Nikolaos Nodarakis & Athanasios Tsakalidis
Computer Science Department, University of Ioannina, Greece
Evaggelia Pitoura
Department of Informatics, Ionian University, 49100, Corfu, Greece
Spyros Sioutas & Dimitrios Tsoumakos
Computer & Informatics Engineering Department, Technological Educational Institute of Western Greece, 26334, Patras, Greece
Giannis Tzimas

Authors

Nikolaos Nodarakis
View author publications
You can also search for this author in PubMed Google Scholar
Evaggelia Pitoura
View author publications
You can also search for this author in PubMed Google Scholar
Spyros Sioutas
View author publications
You can also search for this author in PubMed Google Scholar
Athanasios Tsakalidis
View author publications
You can also search for this author in PubMed Google Scholar
Dimitrios Tsoumakos
View author publications
You can also search for this author in PubMed Google Scholar
Giannis Tzimas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto Tecnológico de Informática, 46022, Valencia, Spain
Hendrik Decker
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, 166 27, Prague 6, Czech Republic
Lenka Lhotská
Department of Computer Science, The University of Auckland, 1010, Auckland, New Zealand
Sebastian Link
Knowledge Management, LMU University of Munich, Leopoldstraße 13, 80802, Munich, Germany
Marcus Spies
University of Linz, FAW, Altenbergerstrasse 69,, 4040, Linz, Austria
Roland R. Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nodarakis, N., Pitoura, E., Sioutas, S., Tsakalidis, A., Tsoumakos, D., Tzimas, G. (2014). Efficient Multidimensional AkNN Query Processing in the Cloud. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds) Database and Expert Systems Applications. DEXA 2014. Lecture Notes in Computer Science, vol 8644. Springer, Cham. https://doi.org/10.1007/978-3-319-10073-9_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-10073-9_41
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10072-2
Online ISBN: 978-3-319-10073-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics