The List of Clusters Revisited
One of the most efficient index for similarity search, to fix ideas think in speeding up k-nn searches in a very large database, is the so called list of clusters. This data structure is a counterintuitive construction which can be seen as extremely unbalanced, as opposed to balanced data structures for exact searching. In practical terms there is no better alternative for exact indexing, when every search return all the incumbent results; as opposed to approximate similarity search. The major drawback of the list of clusters is its quadratic time construction.
In this paper we revisit the list of clusters aiming at speeding up the construction time without sacrificing its efficiency. We obtain similar search times while gaining a significant amount of time in the construction phase.
- 2.Samet, H.: Foundations of Multidimensional and Metric Data Structures, 1st edn. The Morgan Kaufman Series in Computer Graphics and Geometic Modeling. Morgan Kaufmann Publishers, University of Maryland at College Park (2006)Google Scholar
- 9.Baeza-Yates, R., Navarro, G.: Fast approximate string matching in a dictionary. In: Proc. 5th International Symposium on String Processing and Information Retrieval (SPIRE), pp. 14–22. IEEE CS Press (1998)Google Scholar
- 12.Brin, S.: Near neighbor search in large metric spaces. In: Proceedings of the 21th International Conference on Very Large Data Bases, VLDB 1995, pp. 574–584. Morgan Kaufmann Publishers Inc., San Francisco (1995)Google Scholar