Space Efficient Linear Time Computation of the Burrows and Wheeler-Transformation
In  a universal data compression algorithm (BW-algorithm, for short) is described which achieves compression rates that are close to the best known rates achieved in practice. Due to its simplicity, the algorithm can be implemented with relatively low complexity. Recently  modified the BW-algorithm to improve the compression rate even further. For a thorough discussion on the information theoretic background of the BW-algorithm and more references, see . The most time and space consuming part of the BW-algorithm is the Burrows and Wheeler-Transformation (BWT, for short), which permutes the input string in such a way that characters with a similar context are grouped together. In , it was observed that for an input string of length n, this transformation can be computed in O(n) time and space using suffix trees. However, suffix trees have a reputation of being very greedy for space, and therefore most researchers resorted to alternative non-linear methods for computing the BWT: The algorithm of  runs in O(n log n) worst case time and it requires 8n bytes of space. The algorithm of  is based on Quicksort. It is fast on average, but the worst case running time is O(n 2). The Benson-Sedgewick algorithm requires 4n bytes. Its running time can be improved in practice, for the cost of 4n extra bytes. Recently,  showed how to combine the Manber-Myers Algorithm with the Bentley-Sedgewick Algorithm, to achieve a method running in O(n log n) worst case time and using 9n bytes.
KeywordsHead Position Large Node Implementation Technique Suffix Tree Input String
Unable to display preview. Download preview PDF.
- B. Balkenhol, S. Kurtz, “Universal Data Compression Based on the Burrows and Wheeler Transformation: Theory and Practice”, Technical Report, Sonderforschungsbereich: Diskrete Strukturen in der Mathematik, Universität Bielefeld, 98–069, 1998, http://www.mathematik.unibielefeld.de/sfb343/preprints/.Google Scholar
- B. Balkenhol, S. Kurtz and Y. Shtarkov, “Modification of the Burrows and Wheeler Data Compression Algorithm”, In Proceedings of the IEEE Data Compression Conference, Snowbird, Utah, IEEE Computer Society Press, 1999, 188–197.Google Scholar
- J. Bentley, R. Sedgewick, “Fast Algorithms for Sorting and Searching Strings”, In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 1997, 360–369. http://www.cs.princeton.edu/~rs/strings/.
- M. Burrows, D. Wheeler, “A Block-Sorting Lossless Data Compression Algorithm”, Research Report 124, Digital Systems Research Center, 1994 http://www.gatekeeper.dec.com/pub/DEC/SRC/researchreports/abstracts/src-rr-124.html.
- M. Farach, “Optimal Suffix Tree Construction with Large Alphabets”. In Proceedings of the 38th Annual Symposium on the Foundations of Computer Science, FOCS 97,New York. IEEE Comput. Soc. Press, 1997. http://www.cs.rutgers.edu/pub/farach/Suffix.ps.Z.
- S. Kurtz, “Reducing the Space Requirement of Suffix Trees”. Report 98–03,Technische Fakultät, Universität Bielefeld, 1998. http://www.TechFak.Uni-Bielefeld.DE/techfak/~kurtz/publications.html.
- N. Larsson, “The Context Trees of Block Sorting Compression”. In Proceedings of the IEEE Data Compression Conference, Snowbird, Utah, March 30–April 1, IEEE Computer Society Press, 1998, 189–198.Google Scholar
- K. Sadakane, “A Fast Algorithm for Making Suffix Arrays and for Burrows-Wheeler Transformation”. In Proceedings of the IEEE Data Compression Conference, Snowbird, Utah, March 30–April 1, IEEE Computer Society Press, 1998, 129–138.Google Scholar
- E. Ukkonen, “On-line Construction of Suffix-Trees”, Algorithmica, 14 (3), 1995.Google Scholar
- P. Weiner, “Linear Pattern Matching Algorithms”. In Proceedings of the 14th IEEE Annual Symposium on Switching and Automata Theory, The Univsersity of Iowa, 1973, 1–11.Google Scholar