Nearest base-neighbor search on spatial datasets

Abstract

This paper presents a nearest base-neighbor (NBN) search that can be applied to a clustered nearest neighbor problem on spatial datasets with static properties. Given two sets of data points R and S, a query point q, distance threshold δ and cardinality threshold k, the NBN query retrieves a nearest point r (called the base-point) in R where more than k points in S are located within the distance δ. In this paper, we formally define a base-point and NBN problem. As the brute-force approach to this problem in massive datasets has large computational and I/O costs, we propose in-memory and external memory processing techniques for NBN queries. In particular, our proposed in-memory algorithms are used to minimize I/Os in the external memory algorithms. Furthermore, we devise a solution-based index, which we call the neighborhood-augmented grid, to dramatically reduce the search space. A performance study is conducted both on synthetic and real datasets. Our experimental results show the efficiency of our proposed approach.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

References

  1. 1.

    Aly AM, Aref WG, Ouzzani M (2012) Spatial queries with two kNN predicates. PVLDB 5(11):1100–1111. https://doi.org/10.14778/2350229.2350231

    Article  Google Scholar 

  2. 2.

    Aly AM, Aref WG, Ouzzani M (2015) Spatial queries with k-nearest-neighbor and relational predicates. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems. https://doi.org/10.1145/2820783.2820815

  3. 3.

    Cao X, Cong G, Jensen CS, Ooi BC (2011) Collective spatial keyword querying. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 373–384. https://doi.org/10.1145/1989323.1989363

  4. 4.

    Chen Y, Patel JM (2007) Efficient evaluation of all-nearest-neighbor queries. In: Proceedings of IEEE international conference on data engineering, pp 1056–1065. https://doi.org/10.1109/ICDE.2007.368964

  5. 5.

    Cheung K, Fu AWC (1998) Enhanced nearest neighbor search on the R-tree. ACM SIGMOD Record 27(3):16–21. https://doi.org/10.1145/290593.290596

    MathSciNet  Article  Google Scholar 

  6. 6.

    Choi DW, Chung CW (2015) Nearest neighborhood search in spatial databases. In: Proceedings of IEEE international conference on data engineering, pp 699–710. https://doi.org/10.1109/ICDE.2015.7113326

  7. 7.

    Deng K, Sadiq SW, Zhou X, Xu H, Fung GPC, Lu Y (2012) On group nearest group query processing. IEEE Trans Knowl Data Eng 24(2):295–308. https://doi.org/10.1109/TKDE.2010.230

    Article  Google Scholar 

  8. 8.

    Deng K, Zhou X, Shen HT, Xu K, Lin X (2006) Surface k-NN query processing. In: Proceedings of IEEE international conference on data engineering, p 78. https://doi.org/10.1109/ICDE.2006.152

  9. 9.

    Gan J, Tao Y (2015) DBSCAN Revisited: Mis-Claim, Un-fixability, and approximation. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 519–530. https://doi.org/10.1145/2723372.2737792

  10. 10.

    Gao Y, Liu Q, Miao X, Yang J (2016) Reverse k-nearest neighbor search in the presence of obstacles. Inf Sci 330(10):274–292. https://doi.org/10.1016/j.ins.2015.10.022

    Article  Google Scholar 

  11. 11.

    Gunawan A (2013) A faster algorithm for DBSCAN. Master’s Thesis. Technische University Eindhoven, Eindhoven

  12. 12.

    Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 47–57. https://doi.org/10.1145/971697.602266

    Article  Google Scholar 

  13. 13.

    Hjaltason G, Samet H (1999) Distance browsing in spatial databases. ACM Trans Database Syst 24(2):265–318. https://doi.org/10.1145/320248.320255

    Article  Google Scholar 

  14. 14.

    Kolahdouzan MR, Shahabi C (2004) Voronoi-based k nearest neighbor search for spatial network databases. In: International conference on very large data bases, pp 840–851

    Google Scholar 

  15. 15.

    Korn F, Muthukrishnan S (2000) Influence sets based on reverse nearest neighbor queries. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 201–212. https://doi.org/10.1145/335191.335415

    Article  Google Scholar 

  16. 16.

    Lee EY, Cho HJ, Chung TS, Ryu KY (2015) Moving range k nearest neighbor queries with quality guarantee over uncertain moving objects. Inf Sci 325(20):324–341. https://doi.org/10.1016/j.ins.2015.07.034

    MathSciNet  Article  Google Scholar 

  17. 17.

    Papadias D, Kalnis P, Zhang J, Tao Y (2001) Efficient OLAP operations in spatial data warehouse. In: Proceedings of the international symposium on spatial and temporal databases, pp 443–459

    Google Scholar 

  18. 18.

    Papadias D, Shen Q, Tao Y, Mouratidis K (2004) Group nearest neighbor queries. In: Proceedings of IEEE international conference on data engineering. https://doi.org/10.1109/ICDE.2004.1320006

  19. 19.

    Papadias D, Tao Y, Mouratidis K, Hui CK (2005) Aggregate nearest neighbor queries in spatial databases. ACM Trans Database Syst 30(2):529–576. https://doi.org/10.1145/1071610.1071616

    Article  Google Scholar 

  20. 20.

    Roussopoulos N, Kelly S, Vincent F (1995) Nearest neighbor queries. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 71–79

  21. 21.

    Tao Y, Papadias D, Lian X (2002) Continuous nearest neighbor search. In: International conference on very large data bases, pp 287–298

    Google Scholar 

  22. 22.

    Yi S, Ryu H, Son J, Chung YD (2014) View field nearest neighbor: a novel type of spatial queries. Inf Sci 275:68–82. https://doi.org/10.1016/j.ins.2014.02.022

    MathSciNet  Article  Google Scholar 

  23. 23.

    Zhang D, Chan CY, Tan KL (2013) Nearest group queries. In: Proceeding of the conference on scientific and statistical database management. https://doi.org/10.1145/2484838.2484866

  24. 24.

    Zhang D, Chee YM, Mondal A, Tung AKH, Kitsuregawa M (2009) Keyword search in spatial databases: Towards searching by document. In: Proceedings of IEEE international conference on data engineering, pp 688–699. https://doi.org/10.1109/ICDE.2009.77

  25. 25.

    Zhang J, Mamoulis N, Papadias D, Tao Y (2004) All-nearest-neighbors queries in spatial databases. In: Proceedings of the 16th international conference on scientific and statistical database management, pp 297–306. https://doi.org/10.1109/SSDM.2004.1311221

  26. 26.

    U.S. Cencus Bureau. Tiger/Line Shapefiles, https://www.census.gov/geo/maps-data/data/tiger.html. Accessed on 26 Mar 2019

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) Grant Funded by the Korean Government (MSIP) (NRF-2016R1A2B1014013) and by Basic Science Research Program through the National Research Foundation of Korea (NRF) Funded by the Ministry of Education (2016R1D1A1B03930907)

Author information

Affiliations

Authors

Corresponding author

Correspondence to Soon-Young Jung.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

A. Construction of grid

Algorithm 8 shows the construction procedure of the grid used in DPA. The algorithm takes two data sets (RS and SS) and δ as input. We initialize variables for grid construction and calculate the number of cells (Lines 1–6). We initialize each cell ci iteratively by the number of cells and put ci with the key value i into the grid map (Lines 7–13). In the example of Fig. 17, the key value of c10 is 10. Finally, we find the cell c where the point is located for each data point of RS and SS through Algorithm 9 and update the SCR(c) or MCR(c) or c.SS (Lines 14–20). Figure 17 shows an example of a grid construction. nCols is 5, nRows is 5 and the number of cells is 25. RN(c13) represents 21 gray cells including c13.

figurel
figurem
Fig. 17
figure17

An example of a grid in DPA. RN(c13) is a set of gray cells

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jang, H., Hyun, K., Chung, J. et al. Nearest base-neighbor search on spatial datasets. Knowl Inf Syst 62, 867–897 (2020). https://doi.org/10.1007/s10115-019-01360-3

Download citation

Keywords

  • Information technology
  • k-nearest neighbor query
  • Group version of nearest neighbor query
  • Nearest base-neighbor query
  • Spatial databases