Skip to main content

Linear Time Algorithm for the Generalised Longest Common Repeat Problem

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3772))

Abstract

Given a set of strings \(\mathcal{U} = \{T_{1}, T_{2}, . . . , T_{\ell}\}\), the longest common repeat problem is to find the longest common substring that appears at least twice in each string of \(\mathcal{U}\), considering direct, inverted, mirror as well as everted repeats. In this paper we define the generalised longest common repeat problem, where we can set the number of times that a repeat should appear in each string. We present a linear time algorithm for this problem using the suffix array. We also show an application of our algorithm for finding a longest common substring which appears only in a subset \(\mathcal{U}^{\prime}\) of \(\mathcal{U}\) but not in \(\mathcal{U}\)-\(\mathcal{U}^{\prime}\).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: The enhanced suffix array and its application to genome analysis. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 449–463. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: Proceedings of the Fourth Latin American Symposium, pp. 88–94 (2000)

    Google Scholar 

  3. Beckman, J., Soller, M.: Toward a unified approach to genetic mapping of eukaryotes based on sequence tagged microsatellite sites. Biotechnology 8, 930–932 (1990)

    Article  Google Scholar 

  4. Caskey, C.T., et al.: An unstable triplet repeat in a gene related to Myotonic Dystrophy. Science 255, 1256–1258 (1992)

    Article  Google Scholar 

  5. Dori, S., Landau, G.M.: Construction of aho-corasick automaton in linear time for integer alphabets. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 168–177. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  6. Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. Journal of the ACM 47(6), 987–1011 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  7. Inman, K., Rudin, N.: An introduction to forensic DNA analysis. CRC press, Boca Raton (1997)

    Google Scholar 

  8. Jeffreys, A., Monckton, D., Tamaki, K., Neil, D., Armour, J., MacLeod, A., Collick, A., Allen, M., Jobling, M.: Minisatellite variant repeat mapping: application to DNA typing and mutation analysis. In: DNA Fingerprinting: State of the Science, Basel, pp. 125–139 (1993)

    Google Scholar 

  9. Kärkkäinen, J., Sanders, P.: Simpler linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–945. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  10. Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  11. Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 186–199. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  12. Kim, S.-R., Lee, I., Park, K.: A fast algorithm for the generalised k-keyword proximity problem given keyword offsets. Information Processing Letters 91(3), 115–120 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  13. Ko, P., Aluru, S.: Space-efficient linear time construction of suffix arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 200–210. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  14. Landau, G.M., Schmidt, J.P.: An algorithm for approximate tandem repeats. In: Proceedings of the Fourth Combinatorial Pattern Matching, pp. 120–133 (1993)

    Google Scholar 

  15. Landau, G.M., Schmidt, J.P., Sokol, D.: An algorithm for approximate tandem repeats. Journal of Computational Biology 8(1), 1–18 (2001)

    Article  Google Scholar 

  16. Lee, I., Iliopoulos, C.S., Park, K.: Linear time algorithm for the longest common repeat problem. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 10–17. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  17. McCreight, E.M.: A space-economical suffix tree construction algorithm. Journal of the ACM 23(2), 262–272 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  18. Schmidt, J.P.: All highest scoring paths in weighted grid graphs and its application to finding all approximate repeats in strings. SIAM Journal on Computing 27(4), 972–992 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  19. Singer, R.H.: Triplet-repeat transcripts: A role for RNA in disease. Science 280(5364), 696–697 (1998)

    Article  Google Scholar 

  20. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  21. Woo, K.J., Sang-Ho, K., Jae-Kwan, C.: Association of the dopamine transporter gene with Parkinson’s disease in Korean patients. Journal of Korean Medical Science 15(4) (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, I., Ardila, Y.J.P. (2005). Linear Time Algorithm for the Generalised Longest Common Repeat Problem. In: Consens, M., Navarro, G. (eds) String Processing and Information Retrieval. SPIRE 2005. Lecture Notes in Computer Science, vol 3772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11575832_21

Download citation

  • DOI: https://doi.org/10.1007/11575832_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29740-6

  • Online ISBN: 978-3-540-32241-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics