Advertisement

Space Efficient Linear Time Computation of the Burrows and Wheeler-Transformation

  • Stefan Kurtz
  • Bernhard Balkenhol
Chapter

Abstract

In [4] a universal data compression algorithm (BW-algorithm, for short) is described which achieves compression rates that are close to the best known rates achieved in practice. Due to its simplicity, the algorithm can be implemented with relatively low complexity. Recently [2] modified the BW-algorithm to improve the compression rate even further. For a thorough discussion on the information theoretic background of the BW-algorithm and more references, see [1]. The most time and space consuming part of the BW-algorithm is the Burrows and Wheeler-Transformation (BWT, for short), which permutes the input string in such a way that characters with a similar context are grouped together. In [4], it was observed that for an input string of length n, this transformation can be computed in O(n) time and space using suffix trees. However, suffix trees have a reputation of being very greedy for space, and therefore most researchers resorted to alternative non-linear methods for computing the BWT: The algorithm of [9] runs in O(n log n) worst case time and it requires 8n bytes of space. The algorithm of [3] is based on Quicksort. It is fast on average, but the worst case running time is O(n 2). The Benson-Sedgewick algorithm requires 4n bytes. Its running time can be improved in practice, for the cost of 4n extra bytes. Recently, [11] showed how to combine the Manber-Myers Algorithm with the Bentley-Sedgewick Algorithm, to achieve a method running in O(n log n) worst case time and using 9n bytes.

Keywords

Head Position Large Node Implementation Technique Suffix Tree Input String 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    B. Balkenhol, S. Kurtz, “Universal Data Compression Based on the Burrows and Wheeler Transformation: Theory and Practice”, Technical Report, Sonderforschungsbereich: Diskrete Strukturen in der Mathematik, Universität Bielefeld, 98–069, 1998, http://www.mathematik.unibielefeld.de/sfb343/preprints/.Google Scholar
  2. [2]
    B. Balkenhol, S. Kurtz and Y. Shtarkov, “Modification of the Burrows and Wheeler Data Compression Algorithm”, In Proceedings of the IEEE Data Compression Conference, Snowbird, Utah, IEEE Computer Society Press, 1999, 188–197.Google Scholar
  3. [3]
    J. Bentley, R. Sedgewick, “Fast Algorithms for Sorting and Searching Strings”, In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 1997, 360–369. http://www.cs.princeton.edu/~rs/strings/.
  4. [4]
    M. Burrows, D. Wheeler, “A Block-Sorting Lossless Data Compression Algorithm”, Research Report 124, Digital Systems Research Center, 1994 http://www.gatekeeper.dec.com/pub/DEC/SRC/researchreports/abstracts/src-rr-124.html.
  5. [5]
    M. Farach, “Optimal Suffix Tree Construction with Large Alphabets”. In Proceedings of the 38th Annual Symposium on the Foundations of Computer Science, FOCS 97,New York. IEEE Comput. Soc. Press, 1997. http://www.cs.rutgers.edu/pub/farach/Suffix.ps.Z.
  6. [6]
    R. Giegerich, S. Kurtz, “From Ukkonen to McCreight and Weiner: A Unifying View of Linear-Time Suffix Tree Construction”. Algorithmica, 19, 1997, 331–353.MathSciNetzbMATHCrossRefGoogle Scholar
  7. [7]
    S. Kurtz, “Reducing the Space Requirement of Suffix Trees”. Report 98–03,Technische Fakultät, Universität Bielefeld, 1998. http://www.TechFak.Uni-Bielefeld.DE/techfak/~kurtz/publications.html.
  8. [8]
    N. Larsson, “The Context Trees of Block Sorting Compression”. In Proceedings of the IEEE Data Compression Conference, Snowbird, Utah, March 30–April 1, IEEE Computer Society Press, 1998, 189–198.Google Scholar
  9. [9]
    U. Manbar, E. Myers, “Suffix Arrays: A New Method for On-Line String Searches”, SIAM Journal on Computing, 22 (5), 1993, 935–948.MathSciNetCrossRefGoogle Scholar
  10. [10]
    E. McCreight, “A Space-Economical Suffix Tree Construction Algorithm”, Journal of the ACM, 23 (2), 1976, 262–272.MathSciNetzbMATHCrossRefGoogle Scholar
  11. [11]
    K. Sadakane, “A Fast Algorithm for Making Suffix Arrays and for Burrows-Wheeler Transformation”. In Proceedings of the IEEE Data Compression Conference, Snowbird, Utah, March 30–April 1, IEEE Computer Society Press, 1998, 129–138.Google Scholar
  12. [12]
    E. Ukkonen, “On-line Construction of Suffix-Trees”, Algorithmica, 14 (3), 1995.Google Scholar
  13. [13]
    P. Weiner, “Linear Pattern Matching Algorithms”. In Proceedings of the 14th IEEE Annual Symposium on Switching and Automata Theory, The Univsersity of Iowa, 1973, 1–11.Google Scholar

Copyright information

© Springer Science+Business Media New York 2000

Authors and Affiliations

  • Stefan Kurtz
    • 1
  • Bernhard Balkenhol
    • 2
  1. 1.Technische FakultätUniv. BielefeldBielefeldGermany
  2. 2.Fakultät für MathematikUniv. BielefeldBielefeldGermany

Personalised recommendations