Flexible and Efficient Algorithms for Abelian Matching in Genome Sequence

Faro, Simone; Pavone, Arianna

doi:10.1007/978-3-030-17938-0_28

Simone Faro¹⁸ &
Arianna Pavone¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 11465))

Included in the following conference series:

International Work-Conference on Bioinformatics and Biomedical Engineering

1066 Accesses

Abstract

Approximate matching in strings is a fundamental and challenging problem in computer science and in computational biology, and increasingly fast algorithms are highly demanded in many applications including text processing and dna sequence analysis. Recently efficient solutions to specific approximate matching problems on genomic sequences have been designed using a filtering technique, based on the general abelian matching problem, which firstly locates the set of all candidate matching positions and then perform an additional verification test on the collected positions.

The abelian pattern matching problem consists in finding all substrings of a text which are permutations of a given pattern. In this paper we present a new class of algorithms based on a new efficient fingerprint computation approach, called Heap-Counting, which turns out to be fast, flexible and easy to be implemented. We prove that, when applied for searching short patterns on a dna sequence, our solutions have a linear worst case time complexity. In addition we present an experimental evaluation which shows that our newly presented algorithms are among the most efficient and flexible solutions in practice for the abelian matching problem in dna sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The Smart tool is available online at https://smart-tool.github.io/smart/.

References

Amir, A., Apostolico, A., Landau, G.M., Satta, G.: Efficient text fingerprinting via Parikh mapping. J. Discrete Algorithms 1(56), 409–421 (2003)
Article MathSciNet Google Scholar
Baeza-Yates, R.A., Navarro, G.: New and faster filters for multiple approximate string matching. Random Struct. Algorithms 20(1), 23–49 (2002)
Article MathSciNet Google Scholar
Benson, G.: Composition alignment. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS, vol. 2812, pp. 447–461. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39763-2_32
Chapter Google Scholar
Böcker, S.: Simulating multiplexed SNP discovery rates using base-specific cleavage and mass spectrometry. Bioinformatics 23(2), 5–12 (2007). https://doi.org/10.1093/bioinformatics/btl291
Article Google Scholar
Böcker, S.: Sequencing from compomers: using mass spectrometry for DNA de novo sequencing of 200+ nt. J. Comput. Biol. 11(6), 1110–1134 (2004)
Article Google Scholar
Burcsi, P., Cicalese, F., Fici, G., Lipták, Z.: Algorithms for jumbled pattern matching in strings. Int. J. Found. Comput. Sci. 23(2), 357–374 (2012)
Article MathSciNet Google Scholar
Cantone, D., Cristofaro, S., Faro, S.: Efficient matching of biological sequences allowing for non-overlapping inversions. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 364–375. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21458-5_31
Chapter MATH Google Scholar
Cantone, D., Faro, S.: Efficient online Abelian pattern matching in strings by simulating reactive multi-automata. In: Holub, J., Zdarek, J. (eds.) Proceedings of the PSC 2014, pp. 30–42 (2014)
Google Scholar
Chhabra, T., Ghuman, S.S., Tarhio, J.: Tuning algorithms for jumbled matching. In: Holub, J., Zdarek, J. (eds.) Proceedings of the PSC 2015, pp. 57–66 (2015)
Google Scholar
Ejaz, E.: Abelian pattern matching in strings, Ph.D. thesis, Dortmund University of Technology (2010). http://d-nb.info/1007019956
Eres, R., Landau, G.M., Parida, L.: Permutation pattern discovery in biosequences. J. Comput. Biol. 11(6), 1050–1060 (2004)
Article Google Scholar
Faro, S., Lecroq, T., Borzì, S., Di Mauro, S., Maggio, A.: The string matching algorithms research tool. In: Proceeding of Stringology, pp. 99–111 (2016)
Google Scholar
Ghuman, S.S., Tarhio, J.: Jumbled matching with SIMD. In: Holub, J., Zdarek, J. (eds.) Proceeding of the PSC 2016, pp. 114–124 (2016)
Google Scholar
Ghuman, S.S.: Improved online algorithms for jumbled matching. Doctoral Dissertation 242/2017, Aalto University publication series, Aalto University, School of Science, Department of Computer Science (2017)
Google Scholar
Grabowski, S., Faro, S., Giaquinta, E.: String matching with inversions and translocations in linear average time (most of the time). Inf. Process. Lett. 111(11), 516–520 (2011)
Article MathSciNet Google Scholar
Grossi, R., Luccio, F.: Simple and efficient string matching with \(k\) mismatches. Inf. Process. Lett. 33(3), 113–120 (1989)
Article MathSciNet Google Scholar
Horspool, R.N.: Practical fast searching in strings. Softw. Pract. Exp. 10(6), 501–506 (1980)
Article Google Scholar
Jokinen, P., Tarhio, J., Ukkonen, E.: A comparison of approximate string matching algorithms. Softw. Pract. Exp. 26(12), 1439–1458 (1996)
Article Google Scholar
Navarro, G.: Multiple approximate string matching by counting. In: Baeza-Yates, R. (ed.) 1997 Proceeding of the 4th South American Workshop on String Processing, pp. 125–139 (1997)
Google Scholar
Salomaa, A.: Counting (scattered) subwords. Bull. EATCS 81, 165–179 (2003)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Matematica e Informatica, Università di Catania, Viale Andrea Doria 6, 95125, Catania, Italy
Simone Faro
Dipartimento di Scienze Cognitive, Università di Messina, Via Concezione 6, 98122, Messina, Italy
Arianna Pavone

Authors

Simone Faro
View author publications
You can also search for this author in PubMed Google Scholar
Arianna Pavone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simone Faro .

Editor information

Editors and Affiliations

Department of Computer Architecture and Computer Technology Higher Technical School of Information Technology and Telecommunications Engineering, CITIC-UGR, Granada, Spain
Ignacio Rojas
ETSIIT, University of Granada, Granada, Spain
Olga Valenzuela
CITIC-UGR, University of Granada, Granada, Spain
Fernando Rojas
Fundacion Progreso y Salud, Granada, Spain
Francisco Ortuño

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Faro, S., Pavone, A. (2019). Flexible and Efficient Algorithms for Abelian Matching in Genome Sequence. In: Rojas, I., Valenzuela, O., Rojas, F., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2019. Lecture Notes in Computer Science(), vol 11465. Springer, Cham. https://doi.org/10.1007/978-3-030-17938-0_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-17938-0_28
Published: 13 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17937-3
Online ISBN: 978-3-030-17938-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics