An Extension of the Burrows Wheeler Transform and Applications to Sequence Comparison and Data Compression
We introduce a generalization of the Burrows-Wheeler Transform (BWT) that can be applied to a multiset of words. The extended transformation, denoted by E, is reversible, but, differently from BWT, it is also surjective. The E transformation allows to give a definition of distance between two sequences, that we apply here to the problem of the whole mitochondrial genome phylogeny. Moreover we give some consideration about compressing a set of words by using the E transformation as preprocessing.
KeywordsConjugacy Class Cyclic Sequence Original Word Extended Transformation Primitive Word
Unable to display preview. Download preview PDF.
- 1.Burrows, M., Wheeler, D.J.: A block sorting data compression algorithm. Technical report, DIGITAL System Research Center (1994)Google Scholar
- 3.Cilibrasi, R., Vitányi, P.: Clustering by compression. IEEE Trans. Information Theory (submitted, 2005)Google Scholar
- 4.Crochemore, M., Désarménien, J., Perrin, D.: A note on the Burrows-Wheeler transformation. Theoret. Comput. Sci. (to appear)Google Scholar
- 10.Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: A new sequence distance measure based on the Burrows-Wheeler transform. Technical Report 268, University of Palermo, Dipartimento di Matematica ed Appl. (December 2004)Google Scholar