Abstract
In this paper we introduce a new alignment-free method for comparing sequences which is combinatorial by nature and does not use any compressor nor any information-theoretic notion. Such a method is based on an extension of the Burrows-Wheeler Transform, a transformation widely used in the context of Data Compression. The new extended transformation takes as input a multiset of sequences and produces as output a string obtained by a suitable rearrangement of the characters of all the input sequences. By using such a transformation we define a measure to compare sequences that takes into account how the characters coming from different input sequences are mixed in the output string. Such a method is tested on a real data set for the whole mitochondrial genome phylogeny problem. However, the goal of this paper is to introduce a new and general methodology for automatic categorization of sequences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Benedetto, D., Caglioti, E., Loreto, V.: Zipping out relevant information. Computing in Science and Engineering, 80–85 (2003)
Burrows, M., Wheeler, D.J.: A block sorting data compression algorithm. Technical report, DIGITAL System Research Center (1994)
Cao, Y., Janke, A., Waddell, P.J., Westerman, M., Takenaka, O., Murata, S., Okada, N., Pääbo, S., Hasegawa, M.: Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders. J. Mol. Evol. 47, 307–322 (1998)
Cilibrasi, R., Vitányi, P.: Clustering by compression. IEEE Trans. Information Theory 51(4), 1523–1545 (2005)
Crochemore, M., Désarménien, J., Perrin, D.: A note on the Burrows-Wheeler transformation. Theoret. Comput. Sci. 332, 567–572 (2005)
Ergun, F., Muthukrishnan, S., Sahinalp, C.: Comparing sequences with segment rearrangements. In: Pandya, P.K., Radhakrishnan, J. (eds.) FSTTCS 2003. LNCS, vol. 2914, pp. 183–194. Springer, Heidelberg (2003)
Fenwick, P.: The Burrows-Wheeler transform for block sorting text compression: principles and improvements. The Computer Journal 39(9), 731–740 (1996)
Gessel, I.M., Reutenauer, C.: Counting permutations with given cycle structure and descent set. J. Combin. Theory Ser. A 64(2), 189–215 (1993)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)
Ilie, L., Constantinescu, S.: Fine and Wilf’s theorem for any number of periods. TUCS (Turku Center for Computer Science) 25, 65–74 (2003); Proc. WORDS 2003
Larsson, N.J., Sadakane, K.: Faster suffix sorting. Technical Report LU-CS-TR:99-214, LUNDFD6/(NFCS-3140)/1-43/(1999), Department of Computer Science, Lund University, Sweden (1999)
Li, M., Badger, J.H., Chen, X., Kwong, S., Kearney, P., Zhang, H.: An information based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17, 149–154 (2001)
Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.: The similarity metric. IEEE Trans. Inform. Th. 12(5), 3250–3264 (2004)
Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press, Cambridge (2002)
Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing 22(5), 935–948 (1993)
Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: An Extension of the Burrows Wheeler Transform and Applications to Sequence Comparison and Data Compression. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 178–189. Springer, Heidelberg (2005)
Mantaci, S., Restivo, A., Sciortino, M.: Burrows-Wheeler transform and Sturmian words. Informat. Proc. Lett. 86, 241–246 (2003)
Mantaci, S., Restivo, A., Sciortino, M.: An extension of the Burrows-Wheeler Transform to k words. Technical Report 267, University of Palermo, Dipartimento di Matematica ed Appl. (December 2004)
Mantaci, S., Restivo, A., Sciortino, M.: An Extension of the Burrows Wheeler Transform to k Words (Extended Abstract). In: 2005 Data Compression Conference (DCC 2005), Snowbird, UT, USA, 29-31 March 2005, p. 469. IEEE Computer Society, Los Alamitos (2005)
Manzini, G.: The Burrows-Wheeler transform: Theory and practice. In: Kutyłowski, M., Wierzbicki, T., Pacholski, L. (eds.) MFCS 1999. LNCS, vol. 1672, pp. 34–47. Springer, Heidelberg (1999)
McCreight, E.M.: A space-economical suffix tree construction algorithm. Journal of the ACM 23(2), 262–272 (1976)
Mignosi, F., Restivo, A.: Periodicity. In: Lothaire, M. (ed.) Algebraic Combinatorics on Words, ch. 8, pp. 237–274. Cambridge University Press, Cambridge (2002)
Otu, H.H., Sayood, K.: A new sequence distance measure for phylogenetic tree construction. Bioinformatics 19(16), 2122–2130 (2003)
Tijdeman, R., Zamboni, L.: Fine and Wilf words for any periods. Indag. Math. 14(1), 135–147 (2003)
Vinga, S., Almeida, J.: Alignment-free sequence comparison – a review. Bioinformatics 19(4), 513–523 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mantaci, S., Restivo, A., Rosone, G., Sciortino, M. (2005). A New Combinatorial Approach to Sequence Comparison. In: Coppo, M., Lodi, E., Pinna, G.M. (eds) Theoretical Computer Science. ICTCS 2005. Lecture Notes in Computer Science, vol 3701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11560586_28
Download citation
DOI: https://doi.org/10.1007/11560586_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29106-0
Online ISBN: 978-3-540-32024-1
eBook Packages: Computer ScienceComputer Science (R0)