Abstract
Given a set of strings \(\mathcal{U} = \{T_{1}, T_{2}, . . . , T_{\ell}\}\), the longest common repeat problem is to find the longest common substring that appears at least twice in each string of \(\mathcal{U}\), considering direct, inverted, mirror as well as everted repeats. In this paper we define the generalised longest common repeat problem, where we can set the number of times that a repeat should appear in each string. We present a linear time algorithm for this problem using the suffix array. We also show an application of our algorithm for finding a longest common substring which appears only in a subset \(\mathcal{U}^{\prime}\) of \(\mathcal{U}\) but not in \(\mathcal{U}\)-\(\mathcal{U}^{\prime}\).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: The enhanced suffix array and its application to genome analysis. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 449–463. Springer, Heidelberg (2002)
Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: Proceedings of the Fourth Latin American Symposium, pp. 88–94 (2000)
Beckman, J., Soller, M.: Toward a unified approach to genetic mapping of eukaryotes based on sequence tagged microsatellite sites. Biotechnology 8, 930–932 (1990)
Caskey, C.T., et al.: An unstable triplet repeat in a gene related to Myotonic Dystrophy. Science 255, 1256–1258 (1992)
Dori, S., Landau, G.M.: Construction of aho-corasick automaton in linear time for integer alphabets. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 168–177. Springer, Heidelberg (2005)
Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. Journal of the ACM 47(6), 987–1011 (2000)
Inman, K., Rudin, N.: An introduction to forensic DNA analysis. CRC press, Boca Raton (1997)
Jeffreys, A., Monckton, D., Tamaki, K., Neil, D., Armour, J., MacLeod, A., Collick, A., Allen, M., Jobling, M.: Minisatellite variant repeat mapping: application to DNA typing and mutation analysis. In: DNA Fingerprinting: State of the Science, Basel, pp. 125–139 (1993)
Kärkkäinen, J., Sanders, P.: Simpler linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–945. Springer, Heidelberg (2003)
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)
Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 186–199. Springer, Heidelberg (2003)
Kim, S.-R., Lee, I., Park, K.: A fast algorithm for the generalised k-keyword proximity problem given keyword offsets. Information Processing Letters 91(3), 115–120 (2004)
Ko, P., Aluru, S.: Space-efficient linear time construction of suffix arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 200–210. Springer, Heidelberg (2003)
Landau, G.M., Schmidt, J.P.: An algorithm for approximate tandem repeats. In: Proceedings of the Fourth Combinatorial Pattern Matching, pp. 120–133 (1993)
Landau, G.M., Schmidt, J.P., Sokol, D.: An algorithm for approximate tandem repeats. Journal of Computational Biology 8(1), 1–18 (2001)
Lee, I., Iliopoulos, C.S., Park, K.: Linear time algorithm for the longest common repeat problem. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 10–17. Springer, Heidelberg (2004)
McCreight, E.M.: A space-economical suffix tree construction algorithm. Journal of the ACM 23(2), 262–272 (1976)
Schmidt, J.P.: All highest scoring paths in weighted grid graphs and its application to finding all approximate repeats in strings. SIAM Journal on Computing 27(4), 972–992 (1998)
Singer, R.H.: Triplet-repeat transcripts: A role for RNA in disease. Science 280(5364), 696–697 (1998)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)
Woo, K.J., Sang-Ho, K., Jae-Kwan, C.: Association of the dopamine transporter gene with Parkinson’s disease in Korean patients. Journal of Korean Medical Science 15(4) (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, I., Ardila, Y.J.P. (2005). Linear Time Algorithm for the Generalised Longest Common Repeat Problem. In: Consens, M., Navarro, G. (eds) String Processing and Information Retrieval. SPIRE 2005. Lecture Notes in Computer Science, vol 3772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11575832_21
Download citation
DOI: https://doi.org/10.1007/11575832_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29740-6
Online ISBN: 978-3-540-32241-2
eBook Packages: Computer ScienceComputer Science (R0)