A Brief Index for Proximity Searching

  • Eric Sadit Téllez
  • Edgar Chávez
  • Antonio Camarena-Ibarrola
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5856)

Abstract

Many pattern recognition tasks can be modeled as proximity searching. Here the common task is to quickly find all the elements close to a given query without sequentially scanning a very large database.

A recent shift in the searching paradigm has been established by using permutations instead of distances to predict proximity. Every object in the database record how the set of reference objects (the permutants) is seen, i.e. only the relative positions are used. When a query arrives the relative displacements in the permutants between the query and a particular object is measured. This approach turned out to be the most efficient and scalable, at the expense of loosing recall in the answers. The permutation of every object is represented with κ short integers in practice, producing bulky indexes of 16 κn bits.

In this paper we show how to represent the permutation as a binary vector, using just one bit for each permutant (instead of logκ in the plain representation). The Hamming distance in the binary signature is used then to predict proximity between objects in the database. We tested this approach with many real life metric databases obtaining faster queries with a recall close to the Spearman ρ using 16 times less space.

Keywords

Pattern Recognition Task Database Element Multimedia Information Retrieval Fast Query Average Search Time 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers, San Francisco (2006)MATHGoogle Scholar
  2. 2.
    Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)CrossRefGoogle Scholar
  3. 3.
    Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys 33(3), 322–373 (2001)CrossRefGoogle Scholar
  4. 4.
    Chavez, E., Figueroa, K., Navarro, G.: Effective proximity retrieval by ordering permutations. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(9), 1647–1658 (2008)CrossRefGoogle Scholar
  5. 5.
    Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press / Addison-Wesley (1999)Google Scholar
  6. 6.
    Amato, G., Savino, P.: Approximate similarity search in metric spaces using inverted files. In: InfoScale 2008: Proceedings of the 3rd international conference on Scalable information systems, ICST, Brussels, Belgium, Belgium, ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), pp. 1–10 (2008)Google Scholar
  7. 7.
    Ibarrola, A.C., Chávez, E.: A robust entropy-based audio-fingerprint. IEEE, Los Alamitos (2006)Google Scholar
  8. 8.
    Chavez, E., Camarena-Ibarrola, A., Téllez, E.S., Bainbridge, D.: A permutations based index for fast and robust music identification. Technical Report. Universidad Michoacana (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Eric Sadit Téllez
    • 1
  • Edgar Chávez
    • 1
    • 2
  • Antonio Camarena-Ibarrola
    • 1
  1. 1.Universidad Michoacana 
  2. 2.CICESE 

Personalised recommendations