Skip to main content

On using q-gram locations in approximate string matching

  • Session 5. Chair: Hava Siegelmann
  • Conference paper
  • First Online:
Book cover Algorithms — ESA '95 (ESA 1995)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 979))

Included in the following conference series:

Abstract

Approximate string matching with k differences is considered. Filtration of the text is a widely adopted technique to reduce the text area processed by dynamic programming. A sublinear filtration algorithm is presented. The method is based on the locations of the q-grams in the pattern. Samples of q-grams are drawn from the text at fixed periods, and only if consecutive samples appear in the pattern approximately in the same configuration, the text area is examined with dynamic programming. Practical experiments show that this approach gives better filtration efficiency than an earlier method.

The work was supported by the Academy of Finland.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Baeza-Yates and G. Gonnet: A new approach to text searching. Communications of ACM 35, 10 (1992), 74–82.

    Article  Google Scholar 

  2. W. Chang and E. Lawler: Sublinear approximate string matching and biological applications. Algorithmica 12 (1994), 327–344.

    Article  Google Scholar 

  3. W. Chang and T. Marr: Approximate string matching and local similarity. Combinatorial Pattern Matching, Proceedings of 5th Annual Symposium (ed. M. Crochemore and D. Gusfield), Lecture Notes in Computer Science 807, Springer-Verlag, Berlin, 1994, 259–273.

    Google Scholar 

  4. Z. Galil and K. Park: Art improved algorithm for approximate string matching. Proceedings of 16th International Colloquium on Automata, Languages and Programming (ed. M. Chytil et al.), Lecture Notes in Computer Science 372, Springer-Verlag, Berlin, 1989, 394–404.

    Google Scholar 

  5. N. Holsti and E. Sutinen: Approximate string matching using q-gram places. Proc. Seventh Finnish Symposium on Computer Science (ed. M. Penttonen), University of Joensuu, 1994, 23–32.

    Google Scholar 

  6. R. Grossi and F. Luccio: Simple and efficient string matching with k mismatches. Information Processing Letters 33 (1989), 113–120.

    Article  Google Scholar 

  7. P. Jokinen, J. Tarhio, and E. Ukkonen: A comparison of approximate string matching algorithms. Report A-1991-7, Department of Computer Science, University of Helsinki, 1991.

    Google Scholar 

  8. R. Karp and M. Rabin: Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development 31 (1987), 249–260.

    Google Scholar 

  9. P. Pevzner and M. Waterman: A fast filtration algorithm for substring matching problem. Combinatorial Pattern Matching, Proceedings of 4th Annual Symposium (ed. A. Apostolico et al.), Lecture Notes in Computer Science 684, Springer-Verlag, Berlin, 1993, 197–214.

    Google Scholar 

  10. E. Sutinen and J. Tarhio: Information retrieval based on q-gram locations. In preparation.

    Google Scholar 

  11. T. Takaoka: Approximate pattern matching with samples. Proceedings of ISAAC '94, Lecture Notes in Computer Science 834, Springer-Verlag, Berlin, 1994, 234–242.

    Google Scholar 

  12. J. Tarhio and E. Ukkonen: Approximate Boyer-Moore string matching. SIAM Journal on Computing 22, 2 (1993), 243–260.

    Article  Google Scholar 

  13. E. Ukkonen: Approximate string matching with q-grams and maximal matches. Theoretical Computer Science 92, 1 (1992), 191–211.

    Article  Google Scholar 

  14. E. Ukkonen: Finding approximate patterns in strings. Journal of Algorithms 6 (1985), 132–137.

    Article  Google Scholar 

  15. S. Wu and U. Manber: Fast text searching allowing errors. Communications of ACM 35, 10 (1992), 83–91.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Paul Spirakis

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sutinen, E., Tarhio, J. (1995). On using q-gram locations in approximate string matching. In: Spirakis, P. (eds) Algorithms — ESA '95. ESA 1995. Lecture Notes in Computer Science, vol 979. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60313-1_153

Download citation

  • DOI: https://doi.org/10.1007/3-540-60313-1_153

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60313-9

  • Online ISBN: 978-3-540-44913-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics