Skip to main content

A Simple, Fast, Filter-Based Algorithm for Circular Sequence Comparison

  • Conference paper
  • First Online:
  • 568 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10755))

Abstract

This paper deals with the circular sequence comparison problem, a fundamental step in many important tasks in bioinformatics, which appears as an interesting problem in many biological contexts. Traditional algorithms for measuring approximation in sequence comparison are based on the notions of distance or similarity, and are generally computed through sequence alignment techniques. The circular sequence comparison (CSC) problem consists in finding all comparisons of the rotations of a pattern \(\mathcal {P}\) of length m in a text \(\mathcal {T}\) of length n. In CSC, we consider comparisons with minimum distance from circular pattern \(\mathcal {P}\) to text \(\mathcal {T}\) under the Hamming distance model. In this paper, we present a simple and fast filter-based algorithm to solve the CSC problem. We compare our algorithm with the state of the art algorithms and the results are found to be excellent. In particular, our algorithm runs almost twice as fast than the state of the art. Much of the efficiency of our algorithm can be attributed to its filters that are effective but extremely simple and lightweight.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://goo.gl/bKZ52e.

References

  1. Grossi, R., Iliopoulos, C.S., Mercas, R., Pisanti, N., Pissis, S.P., Retha, A., Vayani, F.: Circular sequence comparison: algorithms and applications. Algorithms Mol. Biol. 11(1), 12 (2016)

    Article  Google Scholar 

  2. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  3. Dulbecco, R., Vogt, M.: Evidence for a ring structure of polyoma virus DNA. Proc. Natl. Acad. Sci. 50(2), 236–243 (1963)

    Article  Google Scholar 

  4. Weil, R., Vinograd, J.: The cyclic helix and cyclic coil forms of polyoma viral DNA. Proc. Natl. Acad. Sci. 50(4), 730–738 (1963)

    Article  Google Scholar 

  5. Thanbichler, M., Wang, S., Shapiro, L.: The bacterial nucleoid: A highly organized and dynamic structure. J. Cell Biochem. 96(3), 506–521 (2005)

    Article  Google Scholar 

  6. Lipps, G.: Plasmids: Current Research and Future Trends. Caister Academic Press, Norfolk (2008)

    Google Scholar 

  7. Allers, T., Mevarech, M.: Archaeal genetics - the third way. Nat. Rev. Genet. 6, 58–73 (2005)

    Article  Google Scholar 

  8. Gusfield, D.: Algorithms on Strings, Trees and Sequences. Cambridge University Press, New York (1997)

    Book  MATH  Google Scholar 

  9. Del Castillo, C.S., Hikima, J., Jang, H.B., Nho, S.W., Jung, T.S., Wongtavatchai, J., Kondo, H., Hirono, I., Takeyama, H., Aoki, T.: Comparative sequence analysis of a multidrug-resistant plasmid from aeromonas hydrophila. Antimicrob. Agents Chemother. 57(1), 120–129 (2013)

    Article  Google Scholar 

  10. Mosig, A., Hofacker, I., Stadler, P., Zell, A.: Comparative analysis of cyclic sequences: viroids and other small circular RNAs. German Conference on Bioinformatics. LNI, vol. 83, pp. 93–102 (2006)

    Google Scholar 

  11. Fernandes, F., Pereira, L., Freitas, A.: CSA: an efficient algorithm to improve circular DNA multiple alignment. BMC Bioinform. 10, 1–13 (2009)

    Article  Google Scholar 

  12. Lee, T., Na, J.C., Park, H., Park, K., Sim, J.S.: Finding optimal alignment and consensus of circular strings. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 310–322. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13509-5_28

    Chapter  Google Scholar 

  13. Azim, M.A.R., Iliopoulos, C.S., Rahman, M.S., Samiruzzaman, M.: SimpLiFiCPM: a simple and lightweight filter-based algorithm for circular pattern matching. Int. J. Genomics 2015, 10 (2015). Article ID 259320

    Google Scholar 

  14. Azim, M.A.R., Iliopoulos, C.S., Rahman, M.S., Samiruzzaman, M.: A fast and lightweight filter-based algorithm for circular pattern matching. In: ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (2014)

    Google Scholar 

  15. Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theor. Comput. Sci. 92(1), 191–211 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  16. Helinski, D.R., Clewell, D.: Circular DNA. Annu. Rev. Biochem. 40(1), 899–942 (1971)

    Article  Google Scholar 

  17. Peterlongo, P., Sacomoto, G.A.T., do Lago, A.P., Pisanti, N., Sagot, M.-F.: Lossless filter for multiple repeats with bounded edit distance. Algorithms Mol. Biol. 4(1), 3 (2009)

    Article  Google Scholar 

  18. Maes, M.: On a cyclic string-to-string correction problem. Inf. Process. Lett. 35(2), 73–78 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  19. Ayad, L.A., Barton, C., Pissis, S.P.: A faster and more accurate heuristic for cyclic edit distance computation. Pattern Recogn. Lett. 88(Suppl. C), 81–87 (2017)

    Google Scholar 

  20. Marzal, A., Barrachina, S.: Speeding up the computation of the edit distance for cyclic strings. In: Proceedings of the 15th International Conference on Pattern Recognition, vol. 2, pp. 891–894 (2000)

    Google Scholar 

  21. Bunke, H., Bhler, U.: Applications of approximate string matching to 2D shape recognition. Pattern Recogn. 26(12), 1797–1812 (1993)

    Article  Google Scholar 

  22. Barton, C., Iliopoulos, C.S., Kundu, R., Pissis, S.P., Retha, A., Vayani, F.: Accurate and efficient methods to improve multiple circular sequence alignment. In: Bampis, E. (ed.) SEA 2015. LNCS, vol. 9125, pp. 247–258. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20086-6_19

    Chapter  Google Scholar 

  23. Ayad, L.A.K., Pissis, S.P.: MARS: improving multiple circular sequence alignment using refined sequences. BMC Genomics 18(1), 86 (2017)

    Article  Google Scholar 

  24. Lee, T., Na, J.C., Park, H., Park, K., Sim, J.S.: Finding consensus and optimal alignment of circular strings. Theor. Comput. Sci. 468, 92–101 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  25. Peterlongo, P., Pisanti, N., Boyer, F., do Lago, A.P., Sagot, M.F.: Lossless filter for multiple repetitions with hamming distance. J. Discrete Algorithms 6(3), 497–509 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  26. Rasmussen, K.R., Stoye, J., Myers, E.W.: Efficient q-gram filters for finding all \(\varepsilon \)-matches over a given length. J. Comput. Biol. 13(2), 296–308 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  27. https://github.com/solonas13/csc

  28. http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/bigZips/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md. Aashikur Rahman Azim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Azim, M.A.R., Kabir, M., Rahman, M.S. (2018). A Simple, Fast, Filter-Based Algorithm for Circular Sequence Comparison. In: Rahman, M., Sung, WK., Uehara, R. (eds) WALCOM: Algorithms and Computation. WALCOM 2018. Lecture Notes in Computer Science(), vol 10755. Springer, Cham. https://doi.org/10.1007/978-3-319-75172-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75172-6_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75171-9

  • Online ISBN: 978-3-319-75172-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics