Advertisement

Scalable Pattern Search Analysis

  • Eric Sadit Tellez
  • Edgar Chavez
  • Mario Graff
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6718)

Abstract

Efficiently searching for patterns in very large collections of objects is a very active area of research. Over the last few years a number of indexes have been proposed to speed up the searching procedure. In this paper, we introduce a novel framework (the K-nearest references) in which several approximate proximity indexes can be analyzed and understood. The search spaces where the analyzed indexes work span from vector spaces, general metric spaces up to general similarity spaces.

The proposed framework clarify the principles behind the searching complexity and allows us to propose a number of novel indexes with high recall rate, low search time, and a linear storage requirement as salient characteristics.

Keywords

Candidate List Inverted Index Scalable Pattern Longe Common Subsequence Inverted List 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco (2006)zbMATHGoogle Scholar
  2. 2.
    Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)CrossRefGoogle Scholar
  3. 3.
    Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys 33(3), 322–373 (2001)CrossRefGoogle Scholar
  4. 4.
    Chávez, E., Navarro, G.: Probabilistic proximity search: Fighting the curse of dimensionality in metric spaces. Information Processing Letters 85, 39–46 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Chavez, E., Figueroa, K., Navarro, G.: Effective proximity retrieval by ordering permutations. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(9), 1647–1658 (2008)CrossRefGoogle Scholar
  6. 6.
    Amato, G., Savino, P.: Approximate similarity search in metric spaces using inverted files. In: InfoScale 2008: Proceedings of the 3rd international conference on Scalable information systems, ICST, Brussels, Belgium, Belgium, ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), pp. 1–10 (2008)Google Scholar
  7. 7.
    Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press / Addison-Wesley, New York (1999)Google Scholar
  8. 8.
    Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing documents and images, 2nd edn. Morgan Kaufmann Publishing, San Francisco (1999)zbMATHGoogle Scholar
  9. 9.
    Téllez, E.S., Chávez, E., Camarena-Ibarrola, A.: A brief index for proximity searching. In: Bayro-Corrochano, E., Eklundh, J.-O. (eds.) CIARP 2009. LNCS, vol. 5856, pp. 529–536. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  10. 10.
    Esuli, A.: Pp-index: Using permutation prefixes for efficient and scalable approximate similarity search. In: LSDS-IR Workshop (2009)Google Scholar
  11. 11.
    Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)CrossRefGoogle Scholar
  12. 12.
    Grossman, D.A., Frieder, O.: Information Retrieval: Algorithms and Heuristics. Springer, Heidelberg (2004)CrossRefzbMATHGoogle Scholar
  13. 13.
    Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., Rabitti, F.: Cophir: a test collection for content-based image retrieval. CoRR abs/0905.4627v2 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Eric Sadit Tellez
    • 1
  • Edgar Chavez
    • 1
  • Mario Graff
    • 1
  1. 1.Universidad Michoacana de San Nicolas de HidalgoMéxico

Personalised recommendations