Abstract
Bayesian computations with Hidden Markov Models (HMMs) are often avoided in practice. Instead, due to reduced running time, point estimates – maximum likelihood (ML) or maximum a posterior (MAP) – are obtained and observation sequences are segmented based on the Viterbi path, even though the lack of accuracy and dependency on starting points of the local optimization are well known. We propose a method to speed-up Bayesian computations which addresses this problem for regular and time-dependent HMMs with discrete observations. In particular, we show that by exploiting sequence repetitions, using the four Russians method, and the conditional dependency structure, it is possible to achieve a Θ(logT) speed-up, where T is the length of the observation sequence. Our experimental results on identification of segments of homogeneous nucleic acid composition, known as the DNA segmentation problem, show that the speed-up is also observed in practice.
Availability: An implementation of our method will be available as part of the open source GHMM library from http://ghmm.org .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Andrec, M., Levy, R.M., Talaga, D.S.: Direct determination of kinetic rates from single-molecule photon arrival trajectories using Hidden Markov Models. The Journal of Physical Chemistry A 107(38), 7454–7464 (2003), PMID: 19626138
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The Annals of Mathematical Statistics 41(1), 164–171 (1970)
Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer-Verlag New York, Inc., Secaucus (2006)
Boys, R.J., Henderson, D.A.: A Bayesian approach to DNA sequence segmentation. Biometrics 60(3), 573–581 (2004)
Boys, R.J., Henderson, D.A., Wilkinson, D.J.: Detecting homogeneous segments in DNA sequences by using Hidden Markov Models. Journal of the Royal Statistical Society. Series C (Applied Statistics) 49(2), 269–285 (2000)
Braun, J.V., Muller, H.-G.: Statistical methods for DNA sequence segmentation. Statistical Science 13(2), 142–162 (1998)
Buchsbaum, A.L., Giancarlo, R.: Algorithmic aspects in speech recognition: an introduction. J. Exp. Algorithmics 2 (January 1997)
Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268(1), 78–94 (1997)
Chib, S.: Calculating posterior distributions and modal estimates in Markov mixture models. Journal of Econometrics 75(1), 79–97 (1996)
Churchill, G.: Stochastic models for heterogeneous DNA sequences. Bulletin of Mathematical Biology 51, 79–94 (1989), doi:10.1007/BF02458837
Churchill, G.A.: Hidden Markov chains and the analysis of genome structure. Computers and Chemistry 16(2), 107–115 (1992)
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.J.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Frid, Y., Gusfield, D.: A simple, practical and complete O(n3̂/log n)-time algorithm for RNA folding using the Four-Russians speedup. In: Salzberg, S.L., Warnow, T. (eds.) WABI 2009. LNCS, vol. 5724, pp. 97–107. Springer, Heidelberg (2009)
Gilks, W., Gilks, W., Richardson, S., Spiegelhalter, D.: Markov chain Monte Carlo in practice. Interdisciplinary statistics. Chapman & Hall, Boca Raton (1996)
Guha, S., Li, Y., Neuberg, D.: Bayesian Hidden Markov Modeling of Array CGH data. Journal of the American Statistical Association 103, 485–497 (2008)
Krogh, A.: Two methods for improving performance of a HMM and their application for gene finding. In: Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology, pp. 179–186. AAAI Press, Menlo Park (1997)
Lifshits, Y., Mozes, S., Weimann, O., Ziv-Ukelson, M.: Speeding up HMM decoding and training by exploiting sequence repetitions. Algorithmica 54(3), 379–399 (2009)
Liu, J.S., Lawrence, C.E.: Bayesian inference on biopolymer models. Bioinformatics 15(1), 38–52 (1999)
Manning, C.D., Schütze, H.: Foundations of statistical natural language processing. MIT Press, Cambridge (1999)
Masek, W.J., Paterson, M.S.: A faster algorithm computing string edit distances. Journal of Computer and System Sciences 20(1), 18–31 (1980)
Mozes, S., Weimann, O., Ziv-Ukelson, M.: Speeding up HMM decoding and training by exploiting sequence repetitions. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 4–15. Springer, Heidelberg (2007)
Myers, E.: An O(ND) difference algorithm and its variations. Algorithmica 1, 251–266 (1986), doi:10.1007/BF01840446
Myers, G.: A Four Russians algorithm for regular expression pattern matching. J. ACM 39, 432–448 (1992)
Nicolas, P., Bize, L., Muri, F., Hoebeke, M., Rodolphe, F., Ehrlich, S.D., Prum, B., Bessires, P.: Mining bacillus subtilis chromosome heterogeneities using Hidden Markov Models. Nucleic Acids Research 30(6), 1418–1426 (2002)
Ohler, U., Niemann, H.: Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17(suppl. 1), S199–S206 (2001)
Patterson, T.A., Thomas, L., Wilcox, C., Ovaskainen, O., Matthiopoulos, J.: State-space models of individual animal movement. Trends in Ecology and Evolution 23(2), 87–94 (2008)
Rabiner, L.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Redelings, B.D., Suchard, M.A.: Joint Bayesian estimation of alignment and phylogeny. Systematic Biology 54(3), 401–418 (2005)
Scott, S.: Bayesian methods for Hidden Markov Models: Recursive computing in the 21st century. Journal of the American Statistical Association, 337–351 (March 2002)
Scott, S.L.: A Bayesian paradigm for designing intrusion detection systems. Computational Statistics and Data Analysis 45(1), 69–83 (2004); Computer Security and Statistics
Sims, C.A., Zha, T.: Were there regime switches in U.S. monetary policy? The American Economic Review 96(1), 54–81 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mahmud, M.P., Schliep, A. (2011). Speeding Up Bayesian HMM by the Four Russians Method. In: Przytycka, T.M., Sagot, MF. (eds) Algorithms in Bioinformatics. WABI 2011. Lecture Notes in Computer Science(), vol 6833. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23038-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-23038-7_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23037-0
Online ISBN: 978-3-642-23038-7
eBook Packages: Computer ScienceComputer Science (R0)