Algorithm Optimizations: Low Computational Complexity

Novak, Miroslav

doi:10.1007/978-1-84800-143-5_10

Miroslav Novak³

Part of the book series: Advances in Pattern Recognition ((ACVPR))

1194 Accesses

Advances in ASR are driven by both scientific achievements in the field and the availability of more powerful hardware. While very powerful CPUs allow us to use ever more complex algorithms in server-based large vocabulary ASR systems (e.g. in telephony applications), the capability of embedded platforms will always lag behind. Nevertheless as the popularity of ASR application grows, we can expect an increasing demand for functionality on embedded platforms as well. For example, replacing simple command and control grammar-based applications by natural language understanding (NLU) systems leads to increased vocabulary sizes and thus the need for greater CPU performance. In this chapter we present an overview of ASR decoder design options with an emphasis on techniques which are suitable for embedded platforms. One needs to keep in mind that there is no one-size-fits-all solution; specific algorithmic improvements may only be best applied to highly restricted applications or scenarios. The optimal solution can usually be achieved by making choices with respect to algorithms aimed at maximizing specific benefits for a particular platform and task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aubert, X.L. (2002). An overview of decoding techniques for large vocabulary continuous speech recognition. Computer Speech & Language, vol. 16, no. 1, pp. 89-114.
Article Google Scholar
Bahl, J.L.R., de Souza, P.V., Gopalakrishnan, P.S., Nahamoo, D. and Picheny, M. (1994). Ro-bust methods for using context-dependent features and speech recognition models in a continuous speech recognizer. In Proceedings of ICASSP.
Google Scholar
Balakrishnan, S. (2003). Fast incremental adaptation using maximum likelihood regression and stochastic gradient descent. In Proceedings of Eurospeech.
Google Scholar
Bocchieri, E. (1993). Vector quantization for the efficient computation of continuous density likelihoods. In Proceedings of ICSLP, pp. 692-695.
Google Scholar
Caseiro, D. and Trancose, I. (2006). A specialized on-the-fly algorithm for lexicon and lan-guage model composition. IEEE Transactions on Audio Speech and Language Processing, vol. 14, no. 4, pp. 1281-1291.
Article Google Scholar
Deligne, S., Dharanipragada, S., Gopinath, R., Maison, B., Olsen, R. and Printz, H. (2002). A robust high accuracy speech recognition system for mobile applications. IEEE Transac-tions on Speech and Audio Processing, Special issue on automatic speech recognition for mobile and portable devices, 10 (8), pp. 551-561.
Google Scholar
Dolfing, H.J.G.A. (2002). A comparison of prefix tree and finite-state transducer search space modelings for large vocabulary speech recognition. In Proceedings of ICSLP, pp. 1305-1308.
Google Scholar
Frichtsch, J. and Rogina, I. (1996). The bucket box intersection (BBI) algorithm for fast ap-proximative evaluation of diagonal mixture gaussians. In Proceedings of ICASSP.
Google Scholar
Gales, M.J.F. (1997). Maximum likelihood linear transformations from HMM-based speech recognition. CUED Technical Report TR291.
Google Scholar
Gales, M.J.F., Knill, K.M. and Young, S.J. (1992). State-based Gaussian selection in large vocabulary continuous speech recognition using HMMs. IEEE Transactions on Speech and Audio Processing, vol. 7, no. 2, pp. 154-161.
Google Scholar
Gopalakrishnan, P.S., Bahl, L.R. and Mercer, R.L. (1995). A tree search strategy for large vocabulary continuous speech recognition. In Proceedings of ICASSP, pp. 572-575.
Google Scholar
Kanthak, S., Ney, H., Riley, M. and Mohri, M. (2000). A comparison of two LVR search optimization techniques. In Proceedings of ICSLP, pp. 1309-1312.
Google Scholar
Mohri, M., Pereira, F. and Riley, M. (2002). Weighted finite-state transducers in speech recog-nition. Computer Speech & Language, vol. 16, no. 1, pp. 69-88.
Article Google Scholar
Novak, M., and Picheny, M. (2000). Speed improvement of the tree-based time asynchronous search. In Proceedings of ICSLP, pp. 334-337.
Google Scholar
Novak, M., Gopinath, R.A. and Sedivy, J. (2002). Efficient hierarchical labeler algorithm for Gaussian likelihoods computation in resource constrained speech recognition systems. http://www.research.ibm.com/people/r/rameshg/novak-icassp2002.ps.
Novak, M., Hampl, R., Krbec, P., Bergl, V. and Sedivy, J. (2003). Two-pass search strategy for large list recognition on embedded speech recognition platforms. In Proceedings of ICASSP, pp. 200-203.
Google Scholar
Novak, M. (2005). Memory efficient approximative lattice generation for grammar based decoding. In Proceedings of Eurospeech, pp. 573-576.
Google Scholar
Novak, M. and Bergl, V. (2004). Memory efficient decoding graph compilation with wide cross-word accoustic context. In Proceedings of ICSPL, pp. 281-284.
Google Scholar
Olsen, P. and Dharanipragada, S. (2003). An efficient integrated gender detection scheme and time mediated averaging of gender dependent acoustic models. In Proceedings of Eurospeech, pp. 2509-2512.
Google Scholar
Ortmanns, S., Firzlaff, T. and Ney, H. (1997a). Fast likelihood computation for continuous mixture densities in large vocabulary speech recognition. In Proceedings of Eurospeech, pp. 143-146.
Google Scholar
Ortmanns, S., Ney, H., Eiden, S.A. and Coenen, N. (1997b). Look ahead techniques for fast beam search. In Proceedings of ICASSP, pp. 1783-1786.
Google Scholar
Ortmanns, S., Eiden, S.A. and Ney, H. (1998). Improved lexical tree search for large vocabu-lary speech recognition. In Proceedings of ICASSP, pp. 817-820.
Google Scholar
Ortmanns, S. and Ney, H. (2000). The time-conditioned approach in dynamic programming search for LVCSR. IEEE Transactions on Speech and Audio Processing, vol. 8, no. 6, pp. 676-687.
Article Google Scholar
Renals, S. and Hochberg, M.M. (1999). Start-synchronous search for large vocabulary con-tinuous speech recognition. IEEE Transactions on Speech and Audio Processing, vol. 7, no. 5, pp. 542-553.
Article Google Scholar
Saon, G., Zweig, G., Kingsbury, B., Mangu L. and Chaudhari, U. (2003). An architecture for rapid decoding of large vocabulary conversational speech. In Proceedings of Eurospeech, pp. 1977-1980.
Google Scholar
Saon, G., Zweig, G. and Povey, D. (2005). Anatomy of an extremely fast LVCSR decoder. In Proceedings of Interspeech, pp. 549-552.
Google Scholar
Schalkwyk, J., Hetherington, L. and Story, E. (2003). Speech recognition with dynamic gram-mars using finite-state transducers. In Proceedings of Eurospeech, pp. 1969-1972.
Google Scholar
Schwartz, R. and Austin, S. (1993). A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses. In Proceedings of ICASSP.
Google Scholar
Willet, D. and Katagiri, S. (2002). Recent advances in efficient decoding combining on-line transducer composition and smoothed language model incorporation. In Proceedings of ICASSP, pp. 713-716.
Google Scholar
Zheng, J. and Franco, H. (2002). Fast hierarchical grammar optimization algorithm toward time and space efficiency. In Proceedings of ICSLP, pp. 393-396.
Google Scholar

Download references

Author information

Authors and Affiliations

Speech and Language Technologies, IBM T.J Watson Research Center, USA
Miroslav Novak

Authors

Miroslav Novak
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Novak, M. (2008). Algorithm Optimizations: Low Computational Complexity. In: Automatic Speech Recognition on Mobile Devices and over Communication Networks. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-143-5_10

Download citation

DOI: https://doi.org/10.1007/978-1-84800-143-5_10
Publisher Name: Springer, London
Print ISBN: 978-1-84800-142-8
Online ISBN: 978-1-84800-143-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics