Algorithms for Finding Maximal-Scoring Segment Sets
We examine the problem of finding maximal-scoring sets of disjoint regions in a sequence of scores. The problem arises in DNA and protein segmentation, and in post-processing of sequence alignments. Our key result states a simple recursive relationship between maximal-scoring segment sets. The statement leads to an algorithm that finds such a k-set of segments in a sequence of length n in O(nk) time. We describe linear-time algorithms for finding optimal segment sets using different criteria for choosing k, as well as an algorithm for finding an optimal set of k segments in O(nlog n) time, independently of k. We apply our methods to the identification of non-coding RNA genes in thermophiles.
KeywordsHide Markov Model Maximal Cover Maximal Chain Minimum Description Length Optimal Cover
Unable to display preview. Download preview PDF.
- 6.Ruzzo, W.L., Tompa, M.: A linear time algorithm for finding all maximal scoring subsequences. In: Proc. 7th Intl. Conf. Intelligent Systems in Molecular Biology, pp. 234–241. AAAI Press, Menlo Park (1999)Google Scholar
- 19.Waters, E., et al.: The genome of Nanoarchaeum equitans: insights into early archaeal evolution and derived parasitism. Proc. Natl. Acad. Sci. USA 100 (2003)Google Scholar