Skip to main content

Speeding Up Bayesian HMM by the Four Russians Method

  • Conference paper
Algorithms in Bioinformatics (WABI 2011)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6833))

Included in the following conference series:

Abstract

Bayesian computations with Hidden Markov Models (HMMs) are often avoided in practice. Instead, due to reduced running time, point estimates – maximum likelihood (ML) or maximum a posterior (MAP) – are obtained and observation sequences are segmented based on the Viterbi path, even though the lack of accuracy and dependency on starting points of the local optimization are well known. We propose a method to speed-up Bayesian computations which addresses this problem for regular and time-dependent HMMs with discrete observations. In particular, we show that by exploiting sequence repetitions, using the four Russians method, and the conditional dependency structure, it is possible to achieve a Θ(logT) speed-up, where T is the length of the observation sequence. Our experimental results on identification of segments of homogeneous nucleic acid composition, known as the DNA segmentation problem, show that the speed-up is also observed in practice.

Availability: An implementation of our method will be available as part of the open source GHMM library from http://ghmm.org .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andrec, M., Levy, R.M., Talaga, D.S.: Direct determination of kinetic rates from single-molecule photon arrival trajectories using Hidden Markov Models. The Journal of Physical Chemistry A 107(38), 7454–7464 (2003), PMID: 19626138

    Article  Google Scholar 

  2. Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The Annals of Mathematical Statistics 41(1), 164–171 (1970)

    Article  MATH  Google Scholar 

  3. Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer-Verlag New York, Inc., Secaucus (2006)

    MATH  Google Scholar 

  4. Boys, R.J., Henderson, D.A.: A Bayesian approach to DNA sequence segmentation. Biometrics 60(3), 573–581 (2004)

    Article  MATH  Google Scholar 

  5. Boys, R.J., Henderson, D.A., Wilkinson, D.J.: Detecting homogeneous segments in DNA sequences by using Hidden Markov Models. Journal of the Royal Statistical Society. Series C (Applied Statistics) 49(2), 269–285 (2000)

    Article  MATH  Google Scholar 

  6. Braun, J.V., Muller, H.-G.: Statistical methods for DNA sequence segmentation. Statistical Science 13(2), 142–162 (1998)

    Article  MATH  Google Scholar 

  7. Buchsbaum, A.L., Giancarlo, R.: Algorithmic aspects in speech recognition: an introduction. J. Exp. Algorithmics 2 (January 1997)

    Google Scholar 

  8. Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268(1), 78–94 (1997)

    Article  Google Scholar 

  9. Chib, S.: Calculating posterior distributions and modal estimates in Markov mixture models. Journal of Econometrics 75(1), 79–97 (1996)

    Article  MATH  Google Scholar 

  10. Churchill, G.: Stochastic models for heterogeneous DNA sequences. Bulletin of Mathematical Biology 51, 79–94 (1989), doi:10.1007/BF02458837

    Article  MATH  Google Scholar 

  11. Churchill, G.A.: Hidden Markov chains and the analysis of genome structure. Computers and Chemistry 16(2), 107–115 (1992)

    Article  MATH  Google Scholar 

  12. Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.J.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)

    Book  MATH  Google Scholar 

  13. Frid, Y., Gusfield, D.: A simple, practical and complete O(n3̂/log n)-time algorithm for RNA folding using the Four-Russians speedup. In: Salzberg, S.L., Warnow, T. (eds.) WABI 2009. LNCS, vol. 5724, pp. 97–107. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  14. Gilks, W., Gilks, W., Richardson, S., Spiegelhalter, D.: Markov chain Monte Carlo in practice. Interdisciplinary statistics. Chapman & Hall, Boca Raton (1996)

    MATH  Google Scholar 

  15. Guha, S., Li, Y., Neuberg, D.: Bayesian Hidden Markov Modeling of Array CGH data. Journal of the American Statistical Association 103, 485–497 (2008)

    Article  MATH  Google Scholar 

  16. Krogh, A.: Two methods for improving performance of a HMM and their application for gene finding. In: Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology, pp. 179–186. AAAI Press, Menlo Park (1997)

    Google Scholar 

  17. Lifshits, Y., Mozes, S., Weimann, O., Ziv-Ukelson, M.: Speeding up HMM decoding and training by exploiting sequence repetitions. Algorithmica 54(3), 379–399 (2009)

    Article  MATH  Google Scholar 

  18. Liu, J.S., Lawrence, C.E.: Bayesian inference on biopolymer models. Bioinformatics 15(1), 38–52 (1999)

    Article  Google Scholar 

  19. Manning, C.D., Schütze, H.: Foundations of statistical natural language processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  20. Masek, W.J., Paterson, M.S.: A faster algorithm computing string edit distances. Journal of Computer and System Sciences 20(1), 18–31 (1980)

    Article  MATH  Google Scholar 

  21. Mozes, S., Weimann, O., Ziv-Ukelson, M.: Speeding up HMM decoding and training by exploiting sequence repetitions. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 4–15. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  22. Myers, E.: An O(ND) difference algorithm and its variations. Algorithmica 1, 251–266 (1986), doi:10.1007/BF01840446

    Article  MATH  Google Scholar 

  23. Myers, G.: A Four Russians algorithm for regular expression pattern matching. J. ACM 39, 432–448 (1992)

    Article  MATH  Google Scholar 

  24. Nicolas, P., Bize, L., Muri, F., Hoebeke, M., Rodolphe, F., Ehrlich, S.D., Prum, B., Bessires, P.: Mining bacillus subtilis chromosome heterogeneities using Hidden Markov Models. Nucleic Acids Research 30(6), 1418–1426 (2002)

    Article  Google Scholar 

  25. Ohler, U., Niemann, H.: Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17(suppl. 1), S199–S206 (2001)

    Article  Google Scholar 

  26. Patterson, T.A., Thomas, L., Wilcox, C., Ovaskainen, O., Matthiopoulos, J.: State-space models of individual animal movement. Trends in Ecology and Evolution 23(2), 87–94 (2008)

    Article  Google Scholar 

  27. Rabiner, L.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  28. Redelings, B.D., Suchard, M.A.: Joint Bayesian estimation of alignment and phylogeny. Systematic Biology 54(3), 401–418 (2005)

    Article  Google Scholar 

  29. Scott, S.: Bayesian methods for Hidden Markov Models: Recursive computing in the 21st century. Journal of the American Statistical Association, 337–351 (March 2002)

    Google Scholar 

  30. Scott, S.L.: A Bayesian paradigm for designing intrusion detection systems. Computational Statistics and Data Analysis 45(1), 69–83 (2004); Computer Security and Statistics

    Article  MATH  Google Scholar 

  31. Sims, C.A., Zha, T.: Were there regime switches in U.S. monetary policy? The American Economic Review 96(1), 54–81 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mahmud, M.P., Schliep, A. (2011). Speeding Up Bayesian HMM by the Four Russians Method. In: Przytycka, T.M., Sagot, MF. (eds) Algorithms in Bioinformatics. WABI 2011. Lecture Notes in Computer Science(), vol 6833. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23038-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23038-7_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23037-0

  • Online ISBN: 978-3-642-23038-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics