Advertisement

Quantifying the Invariance and Robustness of Permutation-Based Indexing Schemes

  • Stéphane Marchand-MailletEmail author
  • Edgar Roman-Rangel
  • Hisham Mohamed
  • Frank Nielsen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9939)

Abstract

Providing a fast and accurate (exact or approximate) access to large-scale multidimensional data is a ubiquitous problem and dates back to the early days of large-scale Information Systems. Similarity search, requiring to resolve nearest neighbor (NN) searches, is a fundamental tool for structuring information space. Permutation-based Indexing (PBI) is a reference-based indexing scheme that accelerates NN search by combining the use of landmark points and ranking in place of distance calculation.

In this paper, we are interested in understanding the approximation made by the PBI scheme. The aim is to understand the robustness of the scheme created by modeling and studying by quantifying its invariance properties. After discussing the geometry of PBI, in relation to the study of ranking, from empirical evidence, we make proposals to cater for the inconsistencies of this structure.

Keywords

Permutation based indexing Ranking Geometry 

Notes

Acknowledgments

This work has been partly supported by the Swiss National Science Foundation under project MAAYA (SNF Grant number 144238).

Dr. Hisham Mohamed is now with Sensirion AG, Staefa, Switzerland.

References

  1. 1.
    Amato, G., Esuli, A., Falchi, F.: Pivot selection strategies for permutation-based similarity search. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds.) SISAP 2013. LNCS, vol. 8199, pp. 91–102. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  2. 2.
    Amato, G., Falchi, F., Rabitti, F., Vadicamo, L.: Some theoretical and experimental observations on permutation spaces and similarity search. In: Traina, A.J.M., Traina Jr., C., Cordeiro, R.L.F. (eds.) SISAP 2014. LNCS, vol. 8821, pp. 37–49. Springer, Heidelberg (2014)Google Scholar
  3. 3.
    Amato, G., Rabitti, F., Savino, P., Zezula, P.: Region proximity in metric spaces and its use for approximate similarity search. ACM Trans. Inf. Syst. 21(2), 192–227 (2003)CrossRefGoogle Scholar
  4. 4.
    Ares, L.G., Brisaboa, N.R., Esteller, M.F., Pedreira, O., Places, A.S.: Optimal pivots to minimize the index size for metric access methods. In: Proceedings of the 2009 Second International Workshop on Similarity Search and Applications, SISAP 2009, pp. 74–80. IEEE Computer Society, Washington, DC (2009)Google Scholar
  5. 5.
    Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: International Conference on Database Theory, pp. 217–235 (1999)Google Scholar
  6. 6.
    Burkhard, W.A., Keller, R.M.: Some approaches to best-match file searching. Commun. ACM 16(4), 230–236 (1973)CrossRefzbMATHGoogle Scholar
  7. 7.
    Bustos, B., Navarro, G., Chávez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recogn. Lett. 24(14), 2357–2366 (2003)CrossRefzbMATHGoogle Scholar
  8. 8.
    Chavez, E., Figueroa, K., Navarro, G.: Effective proximity retrieval by ordering permutations. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1647–1658 (2008)CrossRefGoogle Scholar
  9. 9.
    Chávez, E., Marroquín, J.L., Navarro, G.: Fixed queries array: a fast and economical data structure for proximity searching. Multimed. Tools Appl. 14(2), 113–135 (2001)CrossRefzbMATHGoogle Scholar
  10. 10.
    Chávez, E., Navarro, G., Baeza-Yates, R.A., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)CrossRefGoogle Scholar
  11. 11.
    Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB 1997, San Francisco, CA, USA, pp. 426–435 (1997)Google Scholar
  12. 12.
    Garcia, V., Debreuve, E., Nielsen, F., Barlaud, M.: K-nearest neighbor search: fast GPU-based implementations and application to high-dimensional feature matching. In: 2010 17th IEEE International Conference on Image Processing (ICIP), pp. 3757–3760. IEEE (2010)Google Scholar
  13. 13.
    Hinneburg, A., Aggarwal, C.C., Keim, D.A.: What is the nearest neighbor in high dimensional spaces? In: Proceedings of the 26th International Conference on Very Large Data Bases, VLDB 2000, pp. 506–515. Morgan Kaufmann Publishers Inc., San Francisco (2000)Google Scholar
  14. 14.
    Kruliš, M., Osipyan, H., Marchand-Maillet, S.: Optimizing sorting and top-k selection steps in permutation based indexing on GPUs. In: Morzy, T., Valduriez, P., Bellatreche, L. (eds.) ADBIS 2015. CCIS, vol. 539, pp. 305–317. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  15. 15.
    Kumar, R., Vassilvitskii, S.: Generalized distances between rankings. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, New York, NY, USA, pp. 571–580 (2010)Google Scholar
  16. 16.
    Lebanon, G., Lafferty, J.D.: Cranking: combining rankings using conditional probability models on permutations. In: Proceedings of the Nineteenth International Conference on Machine Learning, ICML 2002, pp. 363–370. Morgan Kaufmann Publishers Inc., San Francisco (2002)Google Scholar
  17. 17.
    Li, S., Amenta, N.: Brute-force k-nearest neighbors search on the GPU. In: Amato, G., et al. (eds.) SISAP 2015. LNCS, vol. 9371, pp. 259–270. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-25087-8_25 CrossRefGoogle Scholar
  18. 18.
    Mohamed, H.: Scalable approximate k-NN in multidimensional Big Data (in particular, Chap. 3). Ph.D. thesis, Viper Group, CS Department, University of Geneva, August 2014Google Scholar
  19. 19.
    Mohamed, H., Marchand-Maillet, S.: Distributed media indexing based on MPI and mapreduce. Multimed. Tools Appl. 69(2), 513–537 (2014)CrossRefGoogle Scholar
  20. 20.
    Mohamed, H., Marchand-Maillet, S.: Quantized ranking for permutation-based indexing. Inf. Syst. 52, 163–175 (2015)CrossRefGoogle Scholar
  21. 21.
    Mohamed, H., Osipyan, H., Marchand-Maillet, S.: Multi-core (CPU and GPU) for permutation-based indexing. In: Traina, A.J.M., Traina Jr., C., Cordeiro, R.L.F. (eds.) SISAP 2014. LNCS, vol. 8821, pp. 277–288. Springer, Heidelberg (2014)Google Scholar
  22. 22.
    Mohammed, H., Marchand-Maillet, S.: Scalable indexing for big data processing. In: Li, K.-C., Jiang, H., Yang, L.T., Cuzzocrea, A. (eds.) Big Data: Algorithms, Analytics, and Applications. Chapman & Hall, Boca Raton (2015)Google Scholar
  23. 23.
    Nielsen, F., Piro, P., Barlaud, M.: Bregman vantage point trees for efficient nearest neighbor queries. In: IEEE International Conference on Multimedia and Expo, 2009, ICME 2009, pp. 878–881. IEEE (2009)Google Scholar
  24. 24.
    Novak, D., Batko, M., Zezula, P.: Metric index: an efficient and scalable solution for precise and approximate similarity search. Inf. Syst. 36(4), 721–733 (2011)CrossRefGoogle Scholar
  25. 25.
    Novak, D., Zezula, P.: Performance study of independent anchor spaces for similarity searching. Comput. J. 57(11), 1741–1755 (2014)CrossRefGoogle Scholar
  26. 26.
    Okabe, A., Boots, B., Sugihara, K., Chui, S.N.: Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, 2nd edn. Wiley, New York (2000)CrossRefzbMATHGoogle Scholar
  27. 27.
    Roman-Rangel, E., Marchand-Maillet, S.: Indexing Mayan hieroglyphs with neural codes. In: International Conference on Pattern Recognition (ICPR 2016), Cancun, Mexico (2016)Google Scholar
  28. 28.
    Roman-Rangel, E., Wang, C., Marchand-Maillet, S.: Simmap: similarity maps for scale invariant local shape descriptors. Neurocomputing (Part B) 175, 888–898 (2016)CrossRefGoogle Scholar
  29. 29.
    Samet, H.: Foundations of Multidimensional and Metric Data Structures. The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling. Elsevier/Morgan Kaufmann, California (2006)zbMATHGoogle Scholar
  30. 30.
    Skala, M.: Counting distance permutations. In: IEEE 24th International Conference on Data Engineering Workshop, 2008, ICDEW 2008, pp. 362–369, April 2008Google Scholar
  31. 31.
    Skala, M.: Aspects of metric spaces in computation. Ph.D. thesis, University of Waterloo (2008)Google Scholar
  32. 32.
    Uhlmann, J.K.: Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett. 40(4), 175–179 (1991)CrossRefzbMATHGoogle Scholar
  33. 33.
    Volnyansky, I., Pestov, V.: Curse of dimensionality in pivot based indexes. In: Second International Workshop on Similarity Search and Applications, 2009, SISAP 2009, pp. 39–46, August 2009Google Scholar
  34. 34.
    Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the 24rd International Conference on Very Large Data Bases, VLDB 1998, pp. 194–205. Morgan Kaufmann Publishers Inc., San Francisco (1998)Google Scholar
  35. 35.
    Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)zbMATHGoogle Scholar
  36. 36.
    Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 1993, Philadelphia, PA, USA, pp. 311–321 (1993)Google Scholar
  37. 37.
    Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Advances in Database Systems, vol. 32. Springer, New York (2006)zbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Stéphane Marchand-Maillet
    • 1
    Email author
  • Edgar Roman-Rangel
    • 1
  • Hisham Mohamed
    • 1
  • Frank Nielsen
    • 2
  1. 1.Department of Computer ScienceUniversity of GenevaGenevaSwitzerland
  2. 2.LIX PolytechniqueParisFrance

Personalised recommendations