Compact Suffix Array
- 425 Downloads
Suffix array is a data structure that can be used to index a large text file so that queries of its content can be answered quickly. Basically a suffix array is an array of all suffixes of the text in the lexicographic order. Whether or not a word occurs in the text can be answered in logarithmic time by binary search over the suffix array. In this work we present a method to compress a suffix array such that the search time remains logarithmic. Our experiments show that in some cases a suffix array can be compressed by our method such that the total space requirement is about half of the original.
KeywordsSearch Tree Search Time Binary Search Lexicographic Order Suffix Array
Unable to display preview. Download preview PDF.
- 1.R. Arnold and T. Bell, A corpus for the evaluation of lossless compression algorithms, Proceedings of the Data Compression Conference, 1997, pp. 201–210. http://corpus.canterbury.ac.nz
- 6.M. Crochemore and Renaud Vérin, Direct Construction of Compact Directed Acyclic Word Graphs, In Proc. of the CPM, LNCS 1264 (1997), pp. 116–129.Google Scholar
- 8.G. H. Gonnet, R. A. Baeza-Yates, and T. Snider, Lexicographical indices for text: Inverted files vs. PAT trees, Technical Report OED-91-01, Centre for the New OED, University of Waterloo, 1991.Google Scholar
- 13.P. Weiner, Linear pattern matching algorithms, In Proc. IEEE 14th Annual Symbosium on Switching and Automata Theory, 1973, pp. 1–11.Google Scholar