Abstract
This paper deals with the circular sequence comparison problem, a fundamental step in many important tasks in bioinformatics, which appears as an interesting problem in many biological contexts. Traditional algorithms for measuring approximation in sequence comparison are based on the notions of distance or similarity, and are generally computed through sequence alignment techniques. The circular sequence comparison (CSC) problem consists in finding all comparisons of the rotations of a pattern \(\mathcal {P}\) of length m in a text \(\mathcal {T}\) of length n. In CSC, we consider comparisons with minimum distance from circular pattern \(\mathcal {P}\) to text \(\mathcal {T}\) under the Hamming distance model. In this paper, we present a simple and fast filter-based algorithm to solve the CSC problem. We compare our algorithm with the state of the art algorithms and the results are found to be excellent. In particular, our algorithm runs almost twice as fast than the state of the art. Much of the efficiency of our algorithm can be attributed to its filters that are effective but extremely simple and lightweight.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
References
Grossi, R., Iliopoulos, C.S., Mercas, R., Pisanti, N., Pissis, S.P., Retha, A., Vayani, F.: Circular sequence comparison: algorithms and applications. Algorithms Mol. Biol. 11(1), 12 (2016)
Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Dulbecco, R., Vogt, M.: Evidence for a ring structure of polyoma virus DNA. Proc. Natl. Acad. Sci. 50(2), 236–243 (1963)
Weil, R., Vinograd, J.: The cyclic helix and cyclic coil forms of polyoma viral DNA. Proc. Natl. Acad. Sci. 50(4), 730–738 (1963)
Thanbichler, M., Wang, S., Shapiro, L.: The bacterial nucleoid: A highly organized and dynamic structure. J. Cell Biochem. 96(3), 506–521 (2005)
Lipps, G.: Plasmids: Current Research and Future Trends. Caister Academic Press, Norfolk (2008)
Allers, T., Mevarech, M.: Archaeal genetics - the third way. Nat. Rev. Genet. 6, 58–73 (2005)
Gusfield, D.: Algorithms on Strings, Trees and Sequences. Cambridge University Press, New York (1997)
Del Castillo, C.S., Hikima, J., Jang, H.B., Nho, S.W., Jung, T.S., Wongtavatchai, J., Kondo, H., Hirono, I., Takeyama, H., Aoki, T.: Comparative sequence analysis of a multidrug-resistant plasmid from aeromonas hydrophila. Antimicrob. Agents Chemother. 57(1), 120–129 (2013)
Mosig, A., Hofacker, I., Stadler, P., Zell, A.: Comparative analysis of cyclic sequences: viroids and other small circular RNAs. German Conference on Bioinformatics. LNI, vol. 83, pp. 93–102 (2006)
Fernandes, F., Pereira, L., Freitas, A.: CSA: an efficient algorithm to improve circular DNA multiple alignment. BMC Bioinform. 10, 1–13 (2009)
Lee, T., Na, J.C., Park, H., Park, K., Sim, J.S.: Finding optimal alignment and consensus of circular strings. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 310–322. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13509-5_28
Azim, M.A.R., Iliopoulos, C.S., Rahman, M.S., Samiruzzaman, M.: SimpLiFiCPM: a simple and lightweight filter-based algorithm for circular pattern matching. Int. J. Genomics 2015, 10 (2015). Article ID 259320
Azim, M.A.R., Iliopoulos, C.S., Rahman, M.S., Samiruzzaman, M.: A fast and lightweight filter-based algorithm for circular pattern matching. In: ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (2014)
Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theor. Comput. Sci. 92(1), 191–211 (1992)
Helinski, D.R., Clewell, D.: Circular DNA. Annu. Rev. Biochem. 40(1), 899–942 (1971)
Peterlongo, P., Sacomoto, G.A.T., do Lago, A.P., Pisanti, N., Sagot, M.-F.: Lossless filter for multiple repeats with bounded edit distance. Algorithms Mol. Biol. 4(1), 3 (2009)
Maes, M.: On a cyclic string-to-string correction problem. Inf. Process. Lett. 35(2), 73–78 (1990)
Ayad, L.A., Barton, C., Pissis, S.P.: A faster and more accurate heuristic for cyclic edit distance computation. Pattern Recogn. Lett. 88(Suppl. C), 81–87 (2017)
Marzal, A., Barrachina, S.: Speeding up the computation of the edit distance for cyclic strings. In: Proceedings of the 15th International Conference on Pattern Recognition, vol. 2, pp. 891–894 (2000)
Bunke, H., Bhler, U.: Applications of approximate string matching to 2D shape recognition. Pattern Recogn. 26(12), 1797–1812 (1993)
Barton, C., Iliopoulos, C.S., Kundu, R., Pissis, S.P., Retha, A., Vayani, F.: Accurate and efficient methods to improve multiple circular sequence alignment. In: Bampis, E. (ed.) SEA 2015. LNCS, vol. 9125, pp. 247–258. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20086-6_19
Ayad, L.A.K., Pissis, S.P.: MARS: improving multiple circular sequence alignment using refined sequences. BMC Genomics 18(1), 86 (2017)
Lee, T., Na, J.C., Park, H., Park, K., Sim, J.S.: Finding consensus and optimal alignment of circular strings. Theor. Comput. Sci. 468, 92–101 (2013)
Peterlongo, P., Pisanti, N., Boyer, F., do Lago, A.P., Sagot, M.F.: Lossless filter for multiple repetitions with hamming distance. J. Discrete Algorithms 6(3), 497–509 (2008)
Rasmussen, K.R., Stoye, J., Myers, E.W.: Efficient q-gram filters for finding all \(\varepsilon \)-matches over a given length. J. Comput. Biol. 13(2), 296–308 (2006)
http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/bigZips/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Azim, M.A.R., Kabir, M., Rahman, M.S. (2018). A Simple, Fast, Filter-Based Algorithm for Circular Sequence Comparison. In: Rahman, M., Sung, WK., Uehara, R. (eds) WALCOM: Algorithms and Computation. WALCOM 2018. Lecture Notes in Computer Science(), vol 10755. Springer, Cham. https://doi.org/10.1007/978-3-319-75172-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-75172-6_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75171-9
Online ISBN: 978-3-319-75172-6
eBook Packages: Computer ScienceComputer Science (R0)