Advertisement

Compact Suffix Array

  • Veli Mäkinen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1848)

Abstract

Suffix array is a data structure that can be used to index a large text file so that queries of its content can be answered quickly. Basically a suffix array is an array of all suffixes of the text in the lexicographic order. Whether or not a word occurs in the text can be answered in logarithmic time by binary search over the suffix array. In this work we present a method to compress a suffix array such that the search time remains logarithmic. Our experiments show that in some cases a suffix array can be compressed by our method such that the total space requirement is about half of the original.

Keywords

Search Tree Search Time Binary Search Lexicographic Order Suffix Array 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    R. Arnold and T. Bell, A corpus for the evaluation of lossless compression algorithms, Proceedings of the Data Compression Conference, 1997, pp. 201–210. http://corpus.canterbury.ac.nz
  2. 2.
    A. Blumer, J. Blumer, D. Haussler, A. Ehrenfeucht, M. T. Chen, and J. Seiferas, The smallest automaton recognizing the subwords of a text, Theor. Sci., 40 (1985), pp. 31–55.zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    A. Blumer, J. Blumer, D. Haussler, and R. McConnell, Complete inverted files for efficient text retrieval and analysis, Journal of the ACM, 34:3 (1987), pp. 578–595.CrossRefMathSciNetGoogle Scholar
  4. 4.
    A. Blumer, D. Haussler, A. Ehrenfeucht, Average sizes of suffix trees and dawgs, Discrete Applied Mathematics, 24 (1989), pp. 37–45.zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    M. Crochemore, Transducers and repetitions, Theor. Sci., 45 (1986), pp. 63–86.zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    M. Crochemore and Renaud Vérin, Direct Construction of Compact Directed Acyclic Word Graphs, In Proc. of the CPM, LNCS 1264 (1997), pp. 116–129.Google Scholar
  7. 7.
    Erdös and Rényi, On a new law of large numbers, J. Anal. Math. 22 (1970), pp. 103–111.CrossRefGoogle Scholar
  8. 8.
    G. H. Gonnet, R. A. Baeza-Yates, and T. Snider, Lexicographical indices for text: Inverted files vs. PAT trees, Technical Report OED-91-01, Centre for the New OED, University of Waterloo, 1991.Google Scholar
  9. 9.
    U. Manber and G. Myers, Suffix arrays: A new method for on-line string searches, SIAM J. Comput., 22(1993), pp. 935–948.zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    E. M. McCreight, A space economical suffix tree construction algorithm, Journal of the ACM, 23 (1976), pp. 262–272.zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    E. Ukkonen, On-line construction of suffix-trees, Algorithmica, 14 (1995), pp. 249–260.zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    M. S. Waterman, Introduction to Computational Biology, Chapman & Hall, University Press, Cambridge, UK, 1995, pp. 263–265.zbMATHGoogle Scholar
  13. 13.
    P. Weiner, Linear pattern matching algorithms, In Proc. IEEE 14th Annual Symbosium on Switching and Automata Theory, 1973, pp. 1–11.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Veli Mäkinen
    • 1
  1. 1.Department of Computer ScienceUniversity of HelsinkiFinland

Personalised recommendations