The List of Clusters Revisited

  • Eric Sadit Tellez
  • Edgar Chávez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7329)

Abstract

One of the most efficient index for similarity search, to fix ideas think in speeding up k-nn searches in a very large database, is the so called list of clusters. This data structure is a counterintuitive construction which can be seen as extremely unbalanced, as opposed to balanced data structures for exact searching. In practical terms there is no better alternative for exact indexing, when every search return all the incumbent results; as opposed to approximate similarity search. The major drawback of the list of clusters is its quadratic time construction.

In this paper we revisit the list of clusters aiming at speeding up the construction time without sacrificing its efficiency. We obtain similar search times while gaining a significant amount of time in the construction phase.

References

  1. 1.
    Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)CrossRefGoogle Scholar
  2. 2.
    Samet, H.: Foundations of Multidimensional and Metric Data Structures, 1st edn. The Morgan Kaufman Series in Computer Graphics and Geometic Modeling. Morgan Kaufmann Publishers, University of Maryland at College Park (2006)Google Scholar
  3. 3.
    Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces (survey article). ACM Trans. Database Syst. 28(4), 517–580 (2003)CrossRefGoogle Scholar
  4. 4.
    Vidal Ruiz, E.: An algorithm for finding nearest neighbours in (approximately) constant average time. Pattern Recognition Letters 4, 145–157 (2005)CrossRefGoogle Scholar
  5. 5.
    Micó, M.L., Oncina, J., Vidal, E.: A new version of the nearest-neighbour approximating and eliminating search algorithm (aesa) with linear preprocessing time and memory requirements. Pattern Recogn. Lett. 15, 9–17 (1994)CrossRefGoogle Scholar
  6. 6.
    Chávez, E., Marroquin, J., Navarro, G.: Fixed queries array: A fast and economical data structure for proximity searching. Multimedia Tools and Applications (MTAP) 14(2), 113–135 (2001)MATHCrossRefGoogle Scholar
  7. 7.
    Chávez, E., Figueroa, K.: Faster Proximity Searching in Metric Data. In: Monroy, R., Arroyo-Figueroa, G., Sucar, L.E., Sossa, H. (eds.) MICAI 2004. LNCS (LNAI), vol. 2972, pp. 222–231. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recogn. Lett. 26, 1363–1376 (2005)CrossRefGoogle Scholar
  9. 9.
    Baeza-Yates, R., Navarro, G.: Fast approximate string matching in a dictionary. In: Proc. 5th International Symposium on String Processing and Information Retrieval (SPIRE), pp. 14–22. IEEE CS Press (1998)Google Scholar
  10. 10.
    Uhlmann, J.: Satisfying general proximity/similarity queries with metric trees. Information Processing Letters 40(4), 175–179 (1991)MATHCrossRefGoogle Scholar
  11. 11.
    Burkhard, W., Keller, R.: Some approaches to best-match file searching. Communications of the ACM 16(4), 230–236 (1973)MATHCrossRefGoogle Scholar
  12. 12.
    Brin, S.: Near neighbor search in large metric spaces. In: Proceedings of the 21th International Conference on Very Large Data Bases, VLDB 1995, pp. 574–584. Morgan Kaufmann Publishers Inc., San Francisco (1995)Google Scholar
  13. 13.
    Paredes, R., Reyes, N.: Solving similarity joins and range queries in metric spaces with the list of twin clusters. J. of Discrete Algorithms 7, 18–35 (2009)MathSciNetMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Eric Sadit Tellez
    • 1
  • Edgar Chávez
    • 1
  1. 1.Universidad Michoacana de San Nicolás de HidalgoMéxico

Personalised recommendations