Advertisement

The Most Probable Labeling Problem in HMMs and Its Application to Bioinformatics

  • Broňa Brejová
  • Daniel G. Brown
  • Tomáš Vinař
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3240)

Abstract

Hidden Markov models (HMMs) are often used for biological sequence annotation. Each sequence element is represented by states with the same label. A sequence should be annotated with the labeling of highest probability. Computing this most probable labeling was shown NP-hard by Lyngsø and Pedersen [9]. We improve this result by proving the problem NP-hard for a fixed HMM. High probability labelings are often found by heuristics, such as taking the labeling corresponding to the most probable state path. We introduce an efficient algorithm that computes the most probable labeling for a wide class of HMMs, including models previously used for transmembrane protein topology prediction and coding region detection.

Keywords

Hide Markov Model Probable Label Emission Probability Truth Assignment Viterbi Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268(1), 78–94 (1997)CrossRefGoogle Scholar
  2. 2.
    Casacuberta, F., de la Higuera, C.: Computational Complexity of Problems on Probabilistic Grammars and Transducers. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS (LNAI), vol. 1891, pp. 15–24. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  3. 3.
    Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)zbMATHCrossRefGoogle Scholar
  4. 4.
    Eppstein, D.: Finding the k shortest paths. SIAM Journal on Computing 28(2), 652–673 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Iseli, C., Jongeneel, C.V., Bucher, P.: ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. In: Seventh International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 138–148 (1999)Google Scholar
  6. 6.
    Krogh, A.: Two methods for improving performance of an HMM and their application for gene finding. In: Fifth International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 179–186. AAAI Press, Menlo Park (1997)Google Scholar
  7. 7.
    Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L.: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. Journal of Molecular Biology 305(3), 567–570 (2001)CrossRefGoogle Scholar
  8. 8.
    Lio, P., Goldman, N., Thorne, J.L., Jones, D.T.: PASSML: combining evolutionary inference and protein secondary structure prediction. Bioinformatics 14(8), 726–733 (1998)CrossRefGoogle Scholar
  9. 9.
    Lyngsø, R.B., Pedersen, C.N.S.: The consensus string problem and the complexity of comparing hidden Markov models. Journal of Computer and System Sciences 65(3), 545–569 (2002)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Martelli, P.L., Fariselli, P., Krogh, A., Casadio, R.: A sequence-profile-based HMM for predicting and discriminating beta barrel membrane proteins. Bioinformatics 18(1), S46–53 (2002)Google Scholar
  11. 11.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–285 (1989)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Broňa Brejová
    • 1
  • Daniel G. Brown
    • 1
  • Tomáš Vinař
    • 1
  1. 1.School of Computer ScienceUniversity of WaterlooWaterlooCanada

Personalised recommendations