Advertisement

An Optimal Algorithm for Maximum-Sum Segment and Its Application in Bioinformatics

Extended Abstract
  • Tsai-Hung Fan
  • Shufen Lee
  • Hsueh-I Lu
  • Tsung-Shan Tsou
  • Tsai-Cheng Wang
  • Adam Yao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2759)

Abstract

We study a fundamental sequence algorithm arising from bioinformatics. Given two integers L and U and a sequence A of n numbers, the maximum-sum segment problem is to find a segment A[i,j] of A with Lj+i+1 ≤ U that maximizes A[i]+A[i+1]+···+A[j]. The problem finds applications in finding repeats, designing low complexity filter, and locating segments with rich C+G content for biomolecular sequences. The best known algorithm, due to Lin, Jiang, and Chao, runs in O(n) time, based upon a clever technique called left-negative decomposition for A. In the present paper, we present a new O(n)-time algorithm that bypasses the left-negative decomposition. As a result, our algorithm has the capability to handle the input sequence in an online manner, which is clearly an important feature to cope with genome-scale sequences. We also show how to exploit the sparsity in the input sequence: If A is representable in O(k) space in some format, then our algorithm runs in O(k) time. Moreover, practical implementation of our algorithm running on the rice genome helps us to identify a very long repeat structure in rice chromosome 1 that is previously unknown.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    S. F. Altschul, W. Gish, W. Miller, E.W. Myers, and D. J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215:403–410, 1990.Google Scholar
  2. [2]
    R.V. Davuluri, I. Grosse, and M. Q. Zhang. Computational identification of promoters and first exons in the human genome. Nature Genetics, 29:412–417, 2001.CrossRefGoogle Scholar
  3. [3]
    S. Hannenhalli and S. Levy. Promoter prediction in the human genome. Bioinformatics, 17:S90–S96, 2001.Google Scholar
  4. [4]
    Y.-L. Lin, T. Jiang, and K.-M. Chao. Efficient algorithms for locating the lengthconstrained heaviest segments with applications to biomolecular sequence analysis. Journal of Computer and System Sciences, 65(3):570–586, 2002.zbMATHCrossRefMathSciNetGoogle Scholar
  5. [5]
    The Institute for Genomic Research (TIGR). Rice repeat database. http://www.tigr.org/tdb/e2k1/osa1/blastsearch.shtml.
  6. [6]
    E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249–260, 1995.zbMATHCrossRefMathSciNetGoogle Scholar
  7. [7]
    R.Y. Walder, M. R. Garrett, A.M. McClain, G.E. Beck, T.M. Brennan, N. A. Kramer, A.B. Kanis, A. L. Mark, J.P. Rapp, and V.C. Sheffield. Short tandem repeat polymorphic markers for the rat genome from marker-selected libraries associated with complex mammalian phenotypes. Mammallian Genome, 9:1013–1021, 1998.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Tsai-Hung Fan
    • 1
  • Shufen Lee
    • 2
  • Hsueh-I Lu
    • 3
  • Tsung-Shan Tsou
    • 1
  • Tsai-Cheng Wang
    • 2
  • Adam Yao
    • 2
  1. 1.Institute of StatisticsNational Central UniversityChung-liTaiwan, R.O.C.
  2. 2.Institute of Biomedical SciencesAcademia SinicaTaipeiTaiwan, R.O.C.
  3. 3.Institute of Information ScienceAcademia SinicaTaipeiTaiwan, R.O.C.

Personalised recommendations