How the Burrows-Wheeler Transform works
This chapter will look in detail at how the Burrows-Wheeler Transform is implemented in practice. The examples given in Chapter 1 overlooked some important practical details — to transform a text of n characters the encoder was sorting an array of n strings, each n characters long, and the decoder performed n sorts to reverse the transform. This complexity is not necessary for the BWT, and in this chapter we will see how to perform the encoding and decoding in O(n) space, and O(n log n) time. In fact, using a few tricks, the time can be reduced to O(n).
We will also look at various auxiliary data structures that are used for decoding the Burrows-Wheeler Transform, as some of them, while not essential for decoding, are useful if the transformed text is to be searched. These extra structures can still be constructed in O(n) time so in principle they add little to the decoding cost.
This chapter considers only the transform; in the next chapter we will look at how a compression system can take advantage of the transformed text to reduce its size; we refer to this second phase as the “Local to Global Transform”, or LGT.
We will present the Burrows-Wheeler Transform for coding a string T of n characters, T[1 … n], over an alphabet Σ of |Σ| characters. Note that there is a summary of all the main notation in Appendix A on page 309.
KeywordsPattern Match Original Text Input Text Input String Memory Access Pattern
Unable to display preview. Download preview PDF.