Abstract
We develop a context-sensitive and linear-time K-nearest neighbor search method, wherein the test object and its neighborhood (in the training dataset) are required to share a similar structure via establishing bilateral relations. Our approach particularly enables to deal with two types of irregularities: (i) when the (test) objects are outliers, i.e. they do not belong to any of the existing structures in the (training) dataset, and (ii) when the structures (e.g. classes) in the dataset have diverse densities. Instead of aiming to capture the correct underlying structure of the whole data, we extract the correct structure in the neighborhood of the test object, which leads to computational efficiency of our search strategy. We investigate the performance of our method on a variety of real-world datasets and demonstrate its superior performance compared to the alternatives.
Some part of the work is done at NAVER LABS Europe.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note that our approach is orthogonal to the methods such as Minimax distances [4], where they can be conbined and used together.
- 2.
Note that if we aim to investigate the algorithm for different K in an incremental manner, then for each K, we only need to compute the \(K^{th}\) nearest neighbor, as this search has been already performed for \(K-1\) in the previous step.
- 3.
In our experiments, ENN and IKNN perform very similarly, as both does not provide any outlier detection mechanism. We only report the scores of IKNN.
References
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Brito, M.R., Chávez, E.L., Quiroz, A.J., Yukich, J.E.: Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Stat. Probab. Lett. 35(1), 33–42 (1997)
Chebotarev, P.: A class of graph-geodetic distances generalizing the shortest-path and the resistance distances. Discrete Appl. Math. 159(5), 295–302 (2011)
Chehreghani, M.H.: K-nearest neighbor search and outlier detection via minimax distances. In: Proceedings of the 2016 SIAM International Conference on Data Mining, pp. 405–413 (2016)
Domeniconi, C., Peng, J., Gunopulos, D.: Locally adaptive metric nearest-neighbor classification. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1281–1285 (2002)
Fischer, B., Buhmann, J.M.: Path-based clustering for grouping of smooth curves and texture segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 25(4), 513–518 (2003)
Fouss, F., Francoisse, K., Yen, L., Pirotte, A., Saerens, M.: An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification. Neural Netw. 31, 53–72 (2012)
Fouss, F., Pirotte, A., Renders, J.-M., Saerens, M.: Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Trans. Knowl. Data Eng. 19(3), 355–369 (2007)
Hautamaki, V., Karkkainen, I., Franti, P.: Outlier detection using k-nearest neighbour graph. In: 17th International Conference on Proceedings of the Pattern Recognition, (ICPR 2004), pp. 430–433 (2004)
Hawkins, D.M.: Identification of Outliers. Monographs on Applied Probability and Statistics. Chapman and Hall, London (1980)
Kim, K.-H., Choi, S.: Neighbor search with global geometry: a minimax message passing algorithm. In: ICML, pp. 401–408 (2007)
Kim, K.-H., Choi, S.: Walking on minimax paths for k-nn search. In: AAAI (2013)
Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Maier, M., Hein, M., von Luxburg, U.: Optimal construction of k-nearest-neighbor graphs for identifying noisy clusters. Theor. Comput. Sci. 410(19), 1749–1764 (2009)
Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill Inc., New York (1997)
Musser, D.: Introspective sorting and selection algorithms. Softw. Pract. Experience 27, 983–993 (1997)
Nadler, B., Galun, M.: Fundamental limitations of spectral clustering. In: Advanced in Neural Information Processing Systems, vol. 19, pp. 1017–1024 (2007)
Parvin, H., Alizadeh, H., Minaei-Bidgoli, B.: MKNN: modified k-nearest neighbor
Radovanovic, M., Nanopoulos, A., Ivanovic, M.: Hubs in space: popular nearest neighbors in high-dimensional data. J. Mach. Learn. Res. 11, 2487–2531 (2010)
Radovanovic, M., Nanopoulos, A., Ivanovic, M.: Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE Trans. Knowl. Data Eng. 27(5), 1369–1382 (2015)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Song, Y., Huang, J., Zhou, D., Zha, H., Giles, C.L.: IKNN: informative k-nearest neighbor pattern classification. In: 11th European Conference on Principles and Practice of Knowledge Discovery PKDD, pp. 248–264 (2007)
Tang, B., He, H.: ENN: extended nearest neighbor method for pattern recognition [research frontier]. IEEE Comp. Int. Mag. 10(3), 52–60 (2015)
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319 (2000)
von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Wang, F., Zhang, C.: Label propagation through linear neighborhoods. IEEE Trans. Knowl. Data Eng. 20(1), 55–67 (2008)
Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)
Xing, E.P., Jordan, M.I., Russell, S.J., Ng, A.Y.: Distance metric learning with application to clustering with side-information. In: Advances in Neural Information Processing Systems, vol. 15, pp. 521–528. MIT Press (2003)
Zhang, Z., Zha, H.: Principal manifolds and nonlinear dimension reduction via local tangent space alignment. SIAM J. Sci. Comput. 26, 313–338 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Haghir Chehreghani, M., Haghir Chehreghani, M. (2018). Efficient Context-Aware K-Nearest Neighbor Search. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds) Advances in Information Retrieval. ECIR 2018. Lecture Notes in Computer Science(), vol 10772. Springer, Cham. https://doi.org/10.1007/978-3-319-76941-7_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-76941-7_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76940-0
Online ISBN: 978-3-319-76941-7
eBook Packages: Computer ScienceComputer Science (R0)