Skip to main content

Filtration with q-samples in approximate string matching

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1075))

Included in the following conference series:

Abstract

Two filtration schemes are presented for approximate string matching with k differences. In our approach q-samples, which are non-overlapping q-grams, are drawn from the text, and a text area is checked with dynamic programming, if there are enough exact or slightly distorted q-grams of the pattern in the right order in a short sequence of the q-samples. The filtration schemes are applied to searching both in the text itself and in a q-gram index of the text. The results of preliminary experiments support the applicability of the new methods.

The work was supported by the Academy of Finland.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. V. Arlazarov, E. Dinic, M. Kronrod, and I. Faradzev: On economical construction of the transitive closure of a directed graph. Dokl. Akad. Nauk SSSR 194 (1970), 487–488 (in Russian). English translation in Soviet Math. Dokl. 11 5, 1209–1210.

    Google Scholar 

  2. R. Baeza-Yates and G. Gonnet: A new approach to text searching. Communications of ACM 35, 10 (1992), 74–82.

    Article  Google Scholar 

  3. R. Baeza-Yates, G. Gonnet, and M. Régnier: Analysis of Boyer-Moore-type string searching algorithms. In: Proc. First ACM-SIAM Symposium on Discrete Algorithms, 1990, 328–343.

    Google Scholar 

  4. W. Chang and E. Lawler: Sublinear approximate string matching and biological applications. Algorithmica 12, 4–5 (1994), 327–344.

    Article  Google Scholar 

  5. W. Chang and T. Marr: Approximate string matching and local similarity. In: Combinatorial Pattern Matching, Proceedings of 5th Annual Symposium (ed. M. Crochemore and D. Gusfield), Lecture Notes in Computer Science 807, Springer-Verlag, Berlin, 1994, 259–273.

    Google Scholar 

  6. A. Cobbs: Fast approximate matching using suffix trees. In: Combinatorial Pattern Matching, Proceedings of 5th Annual Symposium (ed. Z. Galil and E. Ukkonen), Lecture Notes in Computer Science 937, Springer, Berlin, 1995, 41–54.

    Google Scholar 

  7. N. Holsti and E. Sutinen: Approximate string matching using q-gram places. Proc. Seventh Finnish Symposium on Computer Science (ed. M. Penttonen), University of Joensuu, 1994, 23–32.

    Google Scholar 

  8. R. Grossi and F. Luccio: Simple and efficient string matching with k mismatches. Information Processing Letters 33 (1989), 113–120.

    Google Scholar 

  9. P. Jokinen, J. Tarhio, and E. Ukkonen: A comparison of approximate string matching algorithms. To appear in Software — Practice and Experience.

    Google Scholar 

  10. P. Jokinen and E. Ukkonen: Two algorithms for approximate string matching in static texts. In: Proceedings of Mathematical Foundations of Computer Science 1991 (ed. A. Tarlecki), Lecture Notes in Computer Science 520, Springer-Verlag, Berlin, 1991, 240–248.

    Google Scholar 

  11. G. Landau and U. Vishkin: Fast string matching with k differences. Journal of Computer and System Sciences 37 (1988), 63–78.

    Google Scholar 

  12. E. Myers: A sublinear algorithm for approximate keyword searching. Algorithmica 12, 4–5 (1994), 345–374.

    Article  Google Scholar 

  13. P. Pevzner and M. Waterman: Multiple filtration and approximate pattern matching. Algorithmica 13 (1995), 135–154.

    Google Scholar 

  14. E. Sutinen and J. Tarhio: On using q-gram locations in approximate string matching. In: Proc. 3rd Annual European Symposium on Algorithms ESA '95 (ed. P. Spirakis), Lecture Notes in Computer Science 979, Springer, Berlin, 1995, 327–340.

    Google Scholar 

  15. T. Takaoka: Approximate pattern matching with samples. Proceedings of ISAAC '94, Lecture Notes in Computer Science 834, Springer-Verlag, Berlin, 1994, 234–242.

    Google Scholar 

  16. J. Tarhio and E. Ukkonen: Approximate Boyer-Moore string matching. SIAM Journal on Computing 22, 2 (1993), 243–260.

    Google Scholar 

  17. E. Ukkonen: Approximate string-matching over suffix trees. In: Combinatorial Pattern Matching, Proceedings of 4th Annual Symposium (ed. A. Apostolico et al.), Lecture Notes in Computer Science 684, Springer-Verlag, Berlin, 1993, 228–243.

    Google Scholar 

  18. E. Ukkonen: Approximate string matching with q-grams and maximal matches. Theoretical Computer Science 92, 1 (1992), 191–211.

    Google Scholar 

  19. E. Ukkonen: Finding approximate patterns in strings. Journal of Algorithms 6 (1985), 132–137.

    Google Scholar 

  20. I. Witten, A. Moffat, and T. Bell: Managing Gigabytes, Van Nostrand Reinhold, New York, 1994.

    Google Scholar 

  21. S. Wu: Approximate pattern matching and its applications. Ph.D. Thesis, Report TR 92-21, Department of Computer Science, University of Arizona, 1992.

    Google Scholar 

  22. S. Wu and U. Manber: Fast text searching allowing errors. Communications of ACM 35, 10 (1992), 83–91.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Dan Hirschberg Gene Myers

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sutinen, E., Tarhio, J. (1996). Filtration with q-samples in approximate string matching. In: Hirschberg, D., Myers, G. (eds) Combinatorial Pattern Matching. CPM 1996. Lecture Notes in Computer Science, vol 1075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61258-0_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-61258-0_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-61258-2

  • Online ISBN: 978-3-540-68390-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics