Advertisement

Lempel-Ziv index for q-grams

  • Juha Kärkkäinen
  • Erkki Sutinen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1136)

Abstract

We present a new sublinear-size index structure for q-grams. A q-gram index of the text is used in many approximate pattern matching algorithms. All earlier q-gram indexes have at least linear size. The new method takes advantage of repetitions in the text found by Lempel-Ziv parsing.

Keywords

Pattern Match Suffix Tree Extended Interval Primary Index Nest Level 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    R. Baeza-Yates: Space-time trade-offs in text retrieval. In: Proc. First South American Workshop on String Processing (ed. R. Baeza-Yates and N. Ziviani), Universidade Federal de Minas Gerais, 1993, 15–21.Google Scholar
  2. 2.
    A. Califano and I. Rigoutsos: FLASH: A fast look-up algorithm for string homology. In: Proc. First International Conference on Intelligent Systems for Molecular Biology (ed. L. Hunter, D. Searls, and J. Shavlik), AAAI Press, 1993, 56–64.Google Scholar
  3. 3.
    W. Chang and T. Marr: Approximate string matching and local similarity. In: Combinatorial Pattern Matching, Proceedings of 5th Annual Symposium (ed. M. Crochemore and D. Gusfield), Lecture Notes in Computer Science 807, Springer-Verlag, Berlin, 1994, 259–273.Google Scholar
  4. 4.
    N. Holsti and E. Sutinen: Approximate string matching using q-gram places. Proc. Seventh Finnish Symposium on Computer Science (ed. M. Penttonen), University of Joensuu, 1994, 23–32.Google Scholar
  5. 5.
    P. Jokinen and E. Ukkonen: Two algorithms for approximate string matching in static texts. In: Proceedings of Mathematical Foundations of Computer Science 1991 (ed. A. Tarlecki), Lecture Notes in Computer Science 520, Springer-Verlag, Berlin, 1991, 240–248.Google Scholar
  6. 6.
    J. Kärkkäinen and E. Ukkonen: Lempel-Ziv parsing and sublinear-size index structures for string matching. To appear in: Proc. 3rd South American Workshop on String Processing WSP '96.Google Scholar
  7. 7.
    G. Landau and U. Vishkin: Fast string matching with k differences. Journal of Computer and System Sciences 37 (1988), 63–78.Google Scholar
  8. 8.
    O. Lehtinen, E. Sutinen and J. Tarhio: Experiments on block indexing. To appear in: Proc. 3rd South American Workshop on String Processing WSP '96. Google Scholar
  9. 9.
    E. M. McCreight: A space-economical suffix tree construction algorithm. J. Assoc. Comput. Mach. 23 (1976), 262–272.Google Scholar
  10. 10.
    E. Myers: A sublinear algorithm for approximate keyword searching. Algorithmica 12, 4–5 (1994), 345–374.CrossRefGoogle Scholar
  11. 11.
    P. Pevzner and M. Waterman: Multiple filtration and approximate pattern matching. Algorithmica 13 (1995), 135–154.Google Scholar
  12. 12.
    E. Sutinen and J. Tarhio: On using q-gram locations in approximate string matching. In: Proc. 3rd Annual European Symposium on Algorithms ESA '95 (ed. P. Spirakis), Lecture Notes in Computer Science 979, Springer, Berlin, 1995, 327–340.Google Scholar
  13. 13.
    E. Sutinen and J. Tarhio: Filtration with q-samples in approximate string matching. In: Proc. 7th Symposium on Combinatorial Pattern Matching CPM '96 (ed. D. Hirschberg and G. Myers), Lecture Notes in Computer Science 1075, Springer, Berlin, 1996, 50–63.Google Scholar
  14. 14.
    T. Takaoka: Approximate pattern matching with samples. Proceedings of ISAAC '94, Lecture Notes in Computer Science 834, Springer-Verlag, Berlin, 1994, 234–242.Google Scholar
  15. 15.
    E. Ukkonen: Finding approximate patterns in strings. Journal of Algorithms 6 (1985), 132–137.CrossRefGoogle Scholar
  16. 16.
    E. Ukkonen: Approximate string matching with q-grams and maximal matches. Theoretical Computer Science 92, 1 (1992), 191–211.CrossRefGoogle Scholar
  17. 17.
    E. Ukkonen: On-line construction of suffix-trees. Algorithmica 14 (1995), 249–260.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Juha Kärkkäinen
    • 1
  • Erkki Sutinen
    • 1
  1. 1.Department of Computer ScienceUniversity of HelsinkiFinland

Personalised recommendations