Skip to main content

Speeding Up HMM Decoding and Training by Exploiting Sequence Repetitions

  • Conference paper
Combinatorial Pattern Matching (CPM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4580))

Included in the following conference series:

Abstract

We present a method to speed up the dynamic program algorithms used for solving the HMM decoding and training problems for discrete time-independent HMMs. We discuss the application of our method to Viterbi’s decoding and training algorithms [21], as well as to the forward-backward and Baum-Welch [4] algorithms. Our approach is based on identifying repeated substrings in the observed input sequence. We describe three algorithms based alternatively on byte pair encoding (BPE) [19], run length encoding (RLE) and Lempel-Ziv (LZ78) parsing [12]. Compared to Viterbi’s algorithm, we achieve a speedup of Ω(r) using BPE, a speedup of \(\Omega(\frac{r}{\log r})\) using RLE, and a speedup of \(\Omega(\frac{\log n}{k})\) using LZ78, where k is the number of hidden states, n is the length of the observed sequence and r is its compression ratio (under each compression scheme). Our experimental results demonstrate that our new algorithms are indeed faster in practice. Furthermore, unlike Viterbi’s algorithm, our algorithms are highly parallelizable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Benson, G., Amir, A., Farach, M.: Let sleeping files lie: Pattern matching in Z-compressed files. Journal of Comp. and Sys. Sciences 52(2), 299–307 (1996)

    Article  MathSciNet  Google Scholar 

  2. Agazzi, O., Kuo, S.: HMM based optical character recognition in the presence of deterministic transformations. Pattern recognition 26, 1813–1826 (1993)

    Article  Google Scholar 

  3. Apostolico, A., Landau, G.M., Skiena, S.: Matching for run length encoded strings. Journal of Complexity 15(1), 4–16 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  4. Baum, L.E.: An inequality and associated maximization technique in statistical estimation for probabilistic functions of a Markov process. Inequalities 3, 1–8 (1972)

    Google Scholar 

  5. Bird, A.P.: Cpg-rich islands as gene markers in the vertebrate nucleus. Trends in Genetics 3, 342–347 (1987)

    Article  Google Scholar 

  6. Buchsbaum, A.L., Giancarlo, R.: Algorithmic aspects in speech recognition: An introduction. ACM Journal of Experimental Algorithms, 2(1) (1997)

    Google Scholar 

  7. Bunke, H., Csirik, J.: An improved algorithm for computing the edit distance of run length coded strings. Information Processing Letters 54, 93–96 (1995)

    Article  MATH  Google Scholar 

  8. Chan, T.M.: All-pairs shortest paths with real weights in O(n 3/log n) time. In: Proc. 9th Workshop on Algorithms and Data Structures, pp. 318–324 (2005)

    Google Scholar 

  9. Churchill, G.A.: Hidden Markov chains and the analysis of genome structure. Computers Chem. 16, 107–115 (1992)

    Article  MATH  Google Scholar 

  10. Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetical progressions. Journal of Symbolic Computation 9, 251–280 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  11. Crochemore, M., Landau, G., Ziv-Ukelson, M.: A sub-quadratic sequence alignment algorithm for unrestricted cost matrices. In: Proc. 13th Annual ACMSIAM Symposium on Discrete Algorithms, pp. 679–688 (2002)

    Google Scholar 

  12. Durbin, R., Eddy, S., Krigh, A., Mitcheson, G.: Biological Sequence Analysis. Cambridge University Press, Cambridge (1998)

    MATH  Google Scholar 

  13. Karkkainen, J., Navarro, G., Ukkonen, E.: Approximate string matching over Ziv-Lempel compressed text. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 195–209. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  14. Karkkainen, J., Ukkonen, E.: Lempel-Ziv parsing and sublinear-size index structures for string matching. In: Proc. Third South American Workshop on String Processing (WSP), pp. 141–155 (1996)

    Google Scholar 

  15. Makinen, V., Navarro, G., Ukkonen, E.: Approximate matching of run-length compressed strings. In: Proc. 12th Annual Symposium On Combinatorial Pattern Matching (CPM). LNCS, vol. 1645, pp. 1–13. Springer, Heidelberg (1999)

    Google Scholar 

  16. Manber, U.: A text compression scheme that allows fast searching directly in the compressed file. In: CPM 2001. LNCS, vol. 2089, pp. 31–49. Springer, Heidelberg (2001)

    Google Scholar 

  17. Manning, C., Schutze, H.: Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  18. Navarro, G., Kida, T., Takeda, M., Shinohara, A., Arikawa, S.: Faster approximate string matching over compressed text. In: Proc. Data Compression Conference (DCC), pp. 459–468 (2001)

    Google Scholar 

  19. Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., Shinohara, A., Shinohara, T., Arikawa, S.: Speeding up pattern matching by text compression. In: Bongiovanni, G., Petreschi, R., Gambosi, G. (eds.) CIAC 2000. LNCS, vol. 1767, pp. 306–315. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  20. Strassen, V.: Gaussian elimination is not optimal. Numerische Mathematik 13, 354–356 (1969)

    Article  MATH  MathSciNet  Google Scholar 

  21. Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory IT-13, 260–269 (1967)

    Article  MATH  Google Scholar 

  22. Ziv, J., Lempel, A.: On the complexity of finite sequences. IEEE Transactions on Information Theory 22(1), 75–81 (1976)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bin Ma Kaizhong Zhang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mozes, S., Weimann, O., Ziv-Ukelson, M. (2007). Speeding Up HMM Decoding and Training by Exploiting Sequence Repetitions. In: Ma, B., Zhang, K. (eds) Combinatorial Pattern Matching. CPM 2007. Lecture Notes in Computer Science, vol 4580. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73437-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73437-6_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73436-9

  • Online ISBN: 978-3-540-73437-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics