Variants of the Burrows-Wheeler Transform
With the huge excitement that was generated by the publication of the original paper on the Burrows-Wheeler Transform in 1994, followed by a more detailed empirical study by Fenwick between 1995 and 1996 (Fenwick, 1995b,c, 1996a,b), it did not take long before researchers started considering different variations, extensions and generalizations of the transform. There were many questions to ask; for instance, given the sorted BWT rotation matrix, is the array of last characters (the last column L of the matrix As) selected by the BWT as its output the only possible choice? And if other choices are possible, might they give better compression? The first column (F) would be an attractive choice if it were possible to recover the original text from this column, since it can be represented very efficiently. It would seem that there is insufficient information to recover the text T from F, and we know how to recover it from L, but what of the columns between them?
Another debate was about the transformation itself. Do we need a complete sorting of all the cyclic rotations of the original text, or can we make do with a limited-length key comparison — for instance, sorting based on the klength prefix of each row, for an arbitrary k? Can we recover the original text without error from such limited-order sorting? Given that sorting is the major bottleneck in BWT-based analysis, if this simplified sort were possible it could have a significant advantage with respect to computational complexity; but what will be the impact on compression? Other questions included whether the BWT can be applied to a word-based alphabet, especially given that the original paper that proposed the MTF algorithm for compression (Bentley et al., 1986) used word-based alphabets. In this chapter we address the above issues and more by considering various published extensions and generalizations of the BWT. Where possible, we include empirical performance of the BWT variant or generalization.
KeywordsCompression Performance Input String Text String Identity Permutation Large Alphabet
Unable to display preview. Download preview PDF.