Linear Time Algorithm for the Generalised Longest Common Repeat Problem

Lee, Inbok; Ardila, Yoan José Pinzón

doi:10.1007/11575832_21

Linear Time Algorithm for the Generalised Longest Common Repeat Problem

Inbok Lee¹⁸ &
Yoan José Pinzón Ardila¹⁸

Conference paper

1513 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3772))

Abstract

Given a set of strings \(\mathcal{U} = \{T_{1}, T_{2}, . . . , T_{\ell}\}\), the longest common repeat problem is to find the longest common substring that appears at least twice in each string of \(\mathcal{U}\), considering direct, inverted, mirror as well as everted repeats. In this paper we define the generalised longest common repeat problem, where we can set the number of times that a repeat should appear in each string. We present a linear time algorithm for this problem using the suffix array. We also show an application of our algorithm for finding a longest common substring which appears only in a subset \(\mathcal{U}^{\prime}\) of \(\mathcal{U}\) but not in \(\mathcal{U}\)-\(\mathcal{U}^{\prime}\).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: The enhanced suffix array and its application to genome analysis. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 449–463. Springer, Heidelberg (2002)
Chapter Google Scholar
Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: Proceedings of the Fourth Latin American Symposium, pp. 88–94 (2000)
Google Scholar
Beckman, J., Soller, M.: Toward a unified approach to genetic mapping of eukaryotes based on sequence tagged microsatellite sites. Biotechnology 8, 930–932 (1990)
Article Google Scholar
Caskey, C.T., et al.: An unstable triplet repeat in a gene related to Myotonic Dystrophy. Science 255, 1256–1258 (1992)
Article Google Scholar
Dori, S., Landau, G.M.: Construction of aho-corasick automaton in linear time for integer alphabets. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 168–177. Springer, Heidelberg (2005)
Chapter Google Scholar
Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. Journal of the ACM 47(6), 987–1011 (2000)
Article MATH MathSciNet Google Scholar
Inman, K., Rudin, N.: An introduction to forensic DNA analysis. CRC press, Boca Raton (1997)
Google Scholar
Jeffreys, A., Monckton, D., Tamaki, K., Neil, D., Armour, J., MacLeod, A., Collick, A., Allen, M., Jobling, M.: Minisatellite variant repeat mapping: application to DNA typing and mutation analysis. In: DNA Fingerprinting: State of the Science, Basel, pp. 125–139 (1993)
Google Scholar
Kärkkäinen, J., Sanders, P.: Simpler linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–945. Springer, Heidelberg (2003)
Chapter Google Scholar
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)
Chapter Google Scholar
Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 186–199. Springer, Heidelberg (2003)
Chapter Google Scholar
Kim, S.-R., Lee, I., Park, K.: A fast algorithm for the generalised k-keyword proximity problem given keyword offsets. Information Processing Letters 91(3), 115–120 (2004)
Article MATH MathSciNet Google Scholar
Ko, P., Aluru, S.: Space-efficient linear time construction of suffix arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 200–210. Springer, Heidelberg (2003)
Chapter Google Scholar
Landau, G.M., Schmidt, J.P.: An algorithm for approximate tandem repeats. In: Proceedings of the Fourth Combinatorial Pattern Matching, pp. 120–133 (1993)
Google Scholar
Landau, G.M., Schmidt, J.P., Sokol, D.: An algorithm for approximate tandem repeats. Journal of Computational Biology 8(1), 1–18 (2001)
Article Google Scholar
Lee, I., Iliopoulos, C.S., Park, K.: Linear time algorithm for the longest common repeat problem. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 10–17. Springer, Heidelberg (2004)
Chapter Google Scholar
McCreight, E.M.: A space-economical suffix tree construction algorithm. Journal of the ACM 23(2), 262–272 (1976)
Article MATH MathSciNet Google Scholar
Schmidt, J.P.: All highest scoring paths in weighted grid graphs and its application to finding all approximate repeats in strings. SIAM Journal on Computing 27(4), 972–992 (1998)
Article MATH MathSciNet Google Scholar
Singer, R.H.: Triplet-repeat transcripts: A role for RNA in disease. Science 280(5364), 696–697 (1998)
Article Google Scholar
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)
Article MATH MathSciNet Google Scholar
Woo, K.J., Sang-Ho, K., Jae-Kwan, C.: Association of the dopamine transporter gene with Parkinson’s disease in Korean patients. Journal of Korean Medical Science 15(4) (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, King’s College London, London, WC2R 2LS, United Kingdom
Inbok Lee & Yoan José Pinzón Ardila

Authors

Inbok Lee
View author publications
You can also search for this author in PubMed Google Scholar
Yoan José Pinzón Ardila
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Toronto,
Mariano Consens
Dept. of Computer Science, University of Chile,
Gonzalo Navarro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, I., Ardila, Y.J.P. (2005). Linear Time Algorithm for the Generalised Longest Common Repeat Problem. In: Consens, M., Navarro, G. (eds) String Processing and Information Retrieval. SPIRE 2005. Lecture Notes in Computer Science, vol 3772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11575832_21

Download citation

DOI: https://doi.org/10.1007/11575832_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29740-6
Online ISBN: 978-3-540-32241-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics