Speeding Up Bayesian HMM by the Four Russians Method

Mahmud, Md Pavel; Schliep, Alexander

doi:10.1007/978-3-642-23038-7_17

Md Pavel Mahmud²¹ &
Alexander Schliep^21,22

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6833))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

1060 Accesses
8 Altmetric

Abstract

Bayesian computations with Hidden Markov Models (HMMs) are often avoided in practice. Instead, due to reduced running time, point estimates – maximum likelihood (ML) or maximum a posterior (MAP) – are obtained and observation sequences are segmented based on the Viterbi path, even though the lack of accuracy and dependency on starting points of the local optimization are well known. We propose a method to speed-up Bayesian computations which addresses this problem for regular and time-dependent HMMs with discrete observations. In particular, we show that by exploiting sequence repetitions, using the four Russians method, and the conditional dependency structure, it is possible to achieve a Θ(logT) speed-up, where T is the length of the observation sequence. Our experimental results on identification of segments of homogeneous nucleic acid composition, known as the DNA segmentation problem, show that the speed-up is also observed in practice.

Availability: An implementation of our method will be available as part of the open source GHMM library from http://ghmm.org .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andrec, M., Levy, R.M., Talaga, D.S.: Direct determination of kinetic rates from single-molecule photon arrival trajectories using Hidden Markov Models. The Journal of Physical Chemistry A 107(38), 7454–7464 (2003), PMID: 19626138
Article Google Scholar
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The Annals of Mathematical Statistics 41(1), 164–171 (1970)
Article MATH Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer-Verlag New York, Inc., Secaucus (2006)
MATH Google Scholar
Boys, R.J., Henderson, D.A.: A Bayesian approach to DNA sequence segmentation. Biometrics 60(3), 573–581 (2004)
Article MATH Google Scholar
Boys, R.J., Henderson, D.A., Wilkinson, D.J.: Detecting homogeneous segments in DNA sequences by using Hidden Markov Models. Journal of the Royal Statistical Society. Series C (Applied Statistics) 49(2), 269–285 (2000)
Article MATH Google Scholar
Braun, J.V., Muller, H.-G.: Statistical methods for DNA sequence segmentation. Statistical Science 13(2), 142–162 (1998)
Article MATH Google Scholar
Buchsbaum, A.L., Giancarlo, R.: Algorithmic aspects in speech recognition: an introduction. J. Exp. Algorithmics 2 (January 1997)
Google Scholar
Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268(1), 78–94 (1997)
Article Google Scholar
Chib, S.: Calculating posterior distributions and modal estimates in Markov mixture models. Journal of Econometrics 75(1), 79–97 (1996)
Article MATH Google Scholar
Churchill, G.: Stochastic models for heterogeneous DNA sequences. Bulletin of Mathematical Biology 51, 79–94 (1989), doi:10.1007/BF02458837
Article MATH Google Scholar
Churchill, G.A.: Hidden Markov chains and the analysis of genome structure. Computers and Chemistry 16(2), 107–115 (1992)
Article MATH Google Scholar
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.J.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Book MATH Google Scholar
Frid, Y., Gusfield, D.: A simple, practical and complete O(n3̂/log n)-time algorithm for RNA folding using the Four-Russians speedup. In: Salzberg, S.L., Warnow, T. (eds.) WABI 2009. LNCS, vol. 5724, pp. 97–107. Springer, Heidelberg (2009)
Chapter Google Scholar
Gilks, W., Gilks, W., Richardson, S., Spiegelhalter, D.: Markov chain Monte Carlo in practice. Interdisciplinary statistics. Chapman & Hall, Boca Raton (1996)
MATH Google Scholar
Guha, S., Li, Y., Neuberg, D.: Bayesian Hidden Markov Modeling of Array CGH data. Journal of the American Statistical Association 103, 485–497 (2008)
Article MATH Google Scholar
Krogh, A.: Two methods for improving performance of a HMM and their application for gene finding. In: Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology, pp. 179–186. AAAI Press, Menlo Park (1997)
Google Scholar
Lifshits, Y., Mozes, S., Weimann, O., Ziv-Ukelson, M.: Speeding up HMM decoding and training by exploiting sequence repetitions. Algorithmica 54(3), 379–399 (2009)
Article MATH Google Scholar
Liu, J.S., Lawrence, C.E.: Bayesian inference on biopolymer models. Bioinformatics 15(1), 38–52 (1999)
Article Google Scholar
Manning, C.D., Schütze, H.: Foundations of statistical natural language processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Masek, W.J., Paterson, M.S.: A faster algorithm computing string edit distances. Journal of Computer and System Sciences 20(1), 18–31 (1980)
Article MATH Google Scholar
Mozes, S., Weimann, O., Ziv-Ukelson, M.: Speeding up HMM decoding and training by exploiting sequence repetitions. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 4–15. Springer, Heidelberg (2007)
Chapter Google Scholar
Myers, E.: An O(ND) difference algorithm and its variations. Algorithmica 1, 251–266 (1986), doi:10.1007/BF01840446
Article MATH Google Scholar
Myers, G.: A Four Russians algorithm for regular expression pattern matching. J. ACM 39, 432–448 (1992)
Article MATH Google Scholar
Nicolas, P., Bize, L., Muri, F., Hoebeke, M., Rodolphe, F., Ehrlich, S.D., Prum, B., Bessires, P.: Mining bacillus subtilis chromosome heterogeneities using Hidden Markov Models. Nucleic Acids Research 30(6), 1418–1426 (2002)
Article Google Scholar
Ohler, U., Niemann, H.: Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17(suppl. 1), S199–S206 (2001)
Article Google Scholar
Patterson, T.A., Thomas, L., Wilcox, C., Ovaskainen, O., Matthiopoulos, J.: State-space models of individual animal movement. Trends in Ecology and Evolution 23(2), 87–94 (2008)
Article Google Scholar
Rabiner, L.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Article Google Scholar
Redelings, B.D., Suchard, M.A.: Joint Bayesian estimation of alignment and phylogeny. Systematic Biology 54(3), 401–418 (2005)
Article Google Scholar
Scott, S.: Bayesian methods for Hidden Markov Models: Recursive computing in the 21st century. Journal of the American Statistical Association, 337–351 (March 2002)
Google Scholar
Scott, S.L.: A Bayesian paradigm for designing intrusion detection systems. Computational Statistics and Data Analysis 45(1), 69–83 (2004); Computer Security and Statistics
Article MATH Google Scholar
Sims, C.A., Zha, T.: Were there regime switches in U.S. monetary policy? The American Economic Review 96(1), 54–81 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Rutgers University, New Jersey, USA
Md Pavel Mahmud & Alexander Schliep
BioMaPS Institute for Quantitative Biology, Rutgers University, New Jersey, USA
Alexander Schliep

Authors

Md Pavel Mahmud
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Schliep
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Center for Biotechnology Information, U.S. National Library of Medicine, 8600 Rockville Pike, 20894, Bethesda, MD, USA
Teresa M. Przytycka
Institut National de Recherche en Informatique et en Automatique (INRIA) and Université Lyon 1 (UCBL), 43 bd du 11 Novembre 1918, 69622, Villeurbanne cedex, France
Marie-France Sagot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mahmud, M.P., Schliep, A. (2011). Speeding Up Bayesian HMM by the Four Russians Method. In: Przytycka, T.M., Sagot, MF. (eds) Algorithms in Bioinformatics. WABI 2011. Lecture Notes in Computer Science(), vol 6833. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23038-7_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-23038-7_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23037-0
Online ISBN: 978-3-642-23038-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics