An Optimal Algorithm for Maximum-Sum Segment and Its Application in Bioinformatics
We study a fundamental sequence algorithm arising from bioinformatics. Given two integers L and U and a sequence A of n numbers, the maximum-sum segment problem is to find a segment A[i,j] of A with L ≤ j+i+1 ≤ U that maximizes A[i]+A[i+1]+···+A[j]. The problem finds applications in finding repeats, designing low complexity filter, and locating segments with rich C+G content for biomolecular sequences. The best known algorithm, due to Lin, Jiang, and Chao, runs in O(n) time, based upon a clever technique called left-negative decomposition for A. In the present paper, we present a new O(n)-time algorithm that bypasses the left-negative decomposition. As a result, our algorithm has the capability to handle the input sequence in an online manner, which is clearly an important feature to cope with genome-scale sequences. We also show how to exploit the sparsity in the input sequence: If A is representable in O(k) space in some format, then our algorithm runs in O(k) time. Moreover, practical implementation of our algorithm running on the rice genome helps us to identify a very long repeat structure in rice chromosome 1 that is previously unknown.
Unable to display preview. Download preview PDF.
- S. F. Altschul, W. Gish, W. Miller, E.W. Myers, and D. J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215:403–410, 1990.Google Scholar
- S. Hannenhalli and S. Levy. Promoter prediction in the human genome. Bioinformatics, 17:S90–S96, 2001.Google Scholar
- The Institute for Genomic Research (TIGR). Rice repeat database. http://www.tigr.org/tdb/e2k1/osa1/blastsearch.shtml.
- R.Y. Walder, M. R. Garrett, A.M. McClain, G.E. Beck, T.M. Brennan, N. A. Kramer, A.B. Kanis, A. L. Mark, J.P. Rapp, and V.C. Sheffield. Short tandem repeat polymorphic markers for the rat genome from marker-selected libraries associated with complex mammalian phenotypes. Mammallian Genome, 9:1013–1021, 1998.CrossRefGoogle Scholar