Approximate Nearest Neighbor Search in Intelligent Classification Systems

  • Andrey V. Savchenko
Part of the SpringerBriefs in Optimization book series (BRIEFSOPTI)


This chapter deals with the problem of insufficient performance of the nearest neighbor-based classification with the medium-sized database (thousands of classes). The key issue of widely applied approximate nearest neighbor algorithms is their heuristic nature. On the contrary, we introduce here a probabilistic approximate NN method by using the asymptotic properties of the classifiers with the segment homogeneity testing from Chap. 2. The joint probabilistic density of the distances to the previously checked reference objects is estimated for each class at every step. The next reference instance to check is selected from the class with the maximal likelihood. Experimental results in image recognition prove that this maximal likelihood search is much more effective for the medium-sized databases, than the brute force and the known approximate nearest neighbor methods.


  1. [1]
    Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45(6), 891–923 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  2. [2]
    Beis, J.S., Lowe, D.G.: Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1000–1006 (1997)Google Scholar
  3. [3]
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
  4. [4]
    Boytsov, L., Naidan, B.: Engineering efficient and effective non-metric space library. In: Proceedings of the International Conference on Similarity Search and Applications (SISAP). Lecture Notes in Computer Science, vol. 8199, pp. 280–293. Springer-Verlag Berlin Heidelberg (2013)Google Scholar
  5. [5]
    Bustos, B., Navarro, G., Chavez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recogn. Lett. 24(14), 2357–2366 (2003)CrossRefzbMATHGoogle Scholar
  6. [6]
    Chavez, E., Navarro, G.: Probabilistic proximity search: fighting the curse of dimensionality in metric spaces. Inf. Process. Lett. 85(1), 39–46 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  7. [7]
    Finkel, R.A., Bentley, J.L.: Quad trees a data structure for retrieval on composite keys. Acta Informatica 4(1), 1–9 (1974)CrossRefzbMATHGoogle Scholar
  8. [8]
    Gonzalez, E., Figueroa, K., Navarro, G.: Effective proximity retrieval by ordering permutations. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1647–1658 (2008)CrossRefGoogle Scholar
  9. [9]
    Haghani, P., Michel, S., Aberer, K.: Distributed similarity search in high dimensions using locality sensitive hashing. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 744–755. ACM (2009)Google Scholar
  10. [10]
    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613 (1998)Google Scholar
  11. [11]
    Kullback, S.: Information Theory and Statistics. Dover Publications, Mineola (1997)zbMATHGoogle Scholar
  12. [12]
    Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets, 2nd edn. Cambridge University Press, Cambridge (2014)CrossRefGoogle Scholar
  13. [13]
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  14. [14]
    Mico, M.L., Oncina, J., Vidal, E.: A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recogn. Lett. 15(1), 9–17 (1994)CrossRefGoogle Scholar
  15. [15]
    Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: Proceedings of International Conference on Computer Vision Theory and Applications, pp. 331–340 (2009)Google Scholar
  16. [16]
    Novak, D., Zezula, P.: M-Chord: a scalable distributed similarity search structure. In: Proceedings of the International Conference on Scalable Information Systems, InfoScale 2006. ACM (2006)Google Scholar
  17. [17]
    Novak, D., Kyselak, M., Zezula, P.: On locality-sensitive indexing in generic metric spaces. In: Proceedings of the Third International Conference on SImilarity Search and APplications, pp. 59–66. ACM, New York (2010)Google Scholar
  18. [18]
    Savchenko, A.V.: Directed enumeration method in image recognition. Pattern Recogn. 45(8), 2952–2961 (2012)CrossRefGoogle Scholar
  19. [19]
    Savchenko, A.V.: Real-time image recognition with the parallel directed enumeration method. In: Chen, M., Leibe, B., Neumann, B. (eds.) Proceedings of the International Conference on Vision Systems (ICVS). Lecture Notes in Computer Science, vol. 7963, pp. 123–132. Springer-Verlag Berlin Heidelberg (2013)Google Scholar
  20. [20]
    Savchenko, A.V.: Clustering and maximum likelihood search for efficient statistical classification with medium-sized databases. Optim. Lett. (2015). doi:10.1007/s11590-015-0948–6
  21. [21]
    Savchenko, A.V.: An optimal greedy approximate nearest neighbor method in statistical pattern recognition. In: Kryszkiewicz, M., Bandyopadhyay, S., Rybinski, H., Pal, S.K. (eds.) Proceedings of the International Conference on Pattern Recognition and Machine Intelligence (PReMI). Lecture Notes in Computer Science, vol. 9124, pp. 236–245. Springer International Publishing Switzerland (2015)Google Scholar
  22. [22]
    Silpa-Anan, C., Hartley, R.: Optimised KD-trees for fast image descriptor matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)Google Scholar
  23. [23]
    Sneath, P., Sokal, R.: Numerical Taxonomy: The Principles and Practice of Numerical Classification. Freeman, San Francisco (1973)zbMATHGoogle Scholar
  24. [24]
    Takci, H., Gungor, T.: A high performance centroid-based classification approach for language identification. Pattern Recogn. Lett. 33(16), 2077–2084 (2012)CrossRefGoogle Scholar
  25. [25]
    Tan, X., Chen, S., Zhou, Z.H., Zhang, F.: Face recognition from a single image per person: a survey. Pattern Recogn. 39(9), 1725–1745 (2006)CrossRefzbMATHGoogle Scholar
  26. [26]
    Vanderkam, D., Schonberger, R., Rowley, H., Kumar, S.: Nearest neighbor search in Google correlate. Technical Report, Google (2013).
  27. [27]
    Vidal, E.R.: An algorithm for finding nearest neighbours in (approximately) constant average time. Pattern Recogn. Lett. 4(3), 145–157 (1986)MathSciNetCrossRefGoogle Scholar
  28. [28]
    Volnyansky, I., Pestov, V.: Curse of dimensionality in pivot based indexes. In: Proceedings of the IEEE International Workshop on Similarity Search and Applications, pp. 39–46 (2009)Google Scholar
  29. [29]
    Zezula, P., Savino, P., Amato, G., Rabitti, F.: Approximate similarity retrieval with M-trees. VLDB J. 7(4), 275–293 (1998)CrossRefGoogle Scholar

Copyright information

© The Author(s) 2016

Authors and Affiliations

  • Andrey V. Savchenko
    • 1
  1. 1.Laboratory of Algorithms and Technologies for Network AnalysisNational Research University Higher School of EconomicsNizhny NovgorodRussia

Personalised recommendations