The Most Probable Labeling Problem in HMMs and Its Application to Bioinformatics
Hidden Markov models (HMMs) are often used for biological sequence annotation. Each sequence element is represented by states with the same label. A sequence should be annotated with the labeling of highest probability. Computing this most probable labeling was shown NP-hard by Lyngsø and Pedersen . We improve this result by proving the problem NP-hard for a fixed HMM. High probability labelings are often found by heuristics, such as taking the labeling corresponding to the most probable state path. We introduce an efficient algorithm that computes the most probable labeling for a wide class of HMMs, including models previously used for transmembrane protein topology prediction and coding region detection.
KeywordsHide Markov Model Probable Label Emission Probability Truth Assignment Viterbi Algorithm
Unable to display preview. Download preview PDF.
- 5.Iseli, C., Jongeneel, C.V., Bucher, P.: ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. In: Seventh International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 138–148 (1999)Google Scholar
- 6.Krogh, A.: Two methods for improving performance of an HMM and their application for gene finding. In: Fifth International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 179–186. AAAI Press, Menlo Park (1997)Google Scholar
- 10.Martelli, P.L., Fariselli, P., Krogh, A., Casadio, R.: A sequence-profile-based HMM for predicting and discriminating beta barrel membrane proteins. Bioinformatics 18(1), S46–53 (2002)Google Scholar