Advertisement

An Extension of the Burrows Wheeler Transform and Applications to Sequence Comparison and Data Compression

  • Sabrina Mantaci
  • Antonio Restivo
  • G. Rosone
  • Marinella Sciortino
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3537)

Abstract

We introduce a generalization of the Burrows-Wheeler Transform (BWT) that can be applied to a multiset of words. The extended transformation, denoted by E, is reversible, but, differently from BWT, it is also surjective. The E transformation allows to give a definition of distance between two sequences, that we apply here to the problem of the whole mitochondrial genome phylogeny. Moreover we give some consideration about compressing a set of words by using the E transformation as preprocessing.

Keywords

Conjugacy Class Cyclic Sequence Original Word Extended Transformation Primitive Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Burrows, M., Wheeler, D.J.: A block sorting data compression algorithm. Technical report, DIGITAL System Research Center (1994)Google Scholar
  2. 2.
    Cao, Y., Janke, A., Waddell, P.J., Westerman, M., Takenaka, O., Murata, S., Okada, N., Pääbo, S., Hasegawa, M.: Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders. J. Mol. Evol. 47, 307–322 (1998)CrossRefGoogle Scholar
  3. 3.
    Cilibrasi, R., Vitányi, P.: Clustering by compression. IEEE Trans. Information Theory (submitted, 2005)Google Scholar
  4. 4.
    Crochemore, M., Désarménien, J., Perrin, D.: A note on the Burrows-Wheeler transformation. Theoret. Comput. Sci. (to appear)Google Scholar
  5. 5.
    Ergun, F., Muthukrishnan, S., Sahinalp, C.: Comparing sequences with segment rearrangements. In: Pandya, P.K., Radhakrishnan, J. (eds.) FSTTCS 2003. LNCS, vol. 2914, pp. 183–194. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  6. 6.
    Gessel, M., Reutenauer, C.: Counting permutations with given cycle structure and descent set. J. Combin. Theory Ser. A 64(2), 189–215 (1993)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.: The similarity metric. IEEE Trans. Inform. Th. 12(5), 3250–3264 (2004)CrossRefGoogle Scholar
  8. 8.
    Lothaire, M.: Combinatorics on Words. Encyclopedia of Mathematics, vol. 17. Addison-Wesley, Reading (1983); Reprinted in the Cambridge Mathematical Library. Cambridge University Press, Cambridge (1997)zbMATHGoogle Scholar
  9. 9.
    Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press, Cambridge (2002)zbMATHGoogle Scholar
  10. 10.
    Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: A new sequence distance measure based on the Burrows-Wheeler transform. Technical Report 268, University of Palermo, Dipartimento di Matematica ed Appl. (December 2004)Google Scholar
  11. 11.
    Mantaci, S., Restivo, A., Sciortino, M.: Burrows-Wheeler transform and Sturmian words. Informat. Proc. Lett. 86, 241–246 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Mantaci, S., Restivo, A., Sciortino, M.: Combinatorial aspects of the Burrows- Wheeler transform. TUCS (Turku Center for Computer Science) General Pubblication 25, 292–297 (2003); proc. WORDS 2003MathSciNetGoogle Scholar
  13. 13.
    Manzini, G.: The Burrows-Wheeler transform: Theory and practice. In: Kutyłowski, M., Wierzbicki, T., Pacholski, L. (eds.) MFCS 1999. LNCS, vol. 1672, pp. 34–47. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  14. 14.
    Otu, H.H., Sayood, K.: A new sequence distance measure for phylogenetic tree construction. Bioinformatics 19(16), 2122–2130 (2003)CrossRefGoogle Scholar
  15. 15.
    Vinga, S., Almeida, J.: Alignment-free sequence comparison – a review. Bioinformatics 19(4), 513–523 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Sabrina Mantaci
    • 1
  • Antonio Restivo
    • 1
  • G. Rosone
    • 1
  • Marinella Sciortino
    • 1
  1. 1.Dipartimento di Matematica ed ApplicationUniversity of PalermoPalermoItaly

Personalised recommendations