Advances in ASR are driven by both scientific achievements in the field and the availability of more powerful hardware. While very powerful CPUs allow us to use ever more complex algorithms in server-based large vocabulary ASR systems (e.g. in telephony applications), the capability of embedded platforms will always lag behind. Nevertheless as the popularity of ASR application grows, we can expect an increasing demand for functionality on embedded platforms as well. For example, replacing simple command and control grammar-based applications by natural language understanding (NLU) systems leads to increased vocabulary sizes and thus the need for greater CPU performance. In this chapter we present an overview of ASR decoder design options with an emphasis on techniques which are suitable for embedded platforms. One needs to keep in mind that there is no one-size-fits-all solution; specific algorithmic improvements may only be best applied to highly restricted applications or scenarios. The optimal solution can usually be achieved by making choices with respect to algorithms aimed at maximizing specific benefits for a particular platform and task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aubert, X.L. (2002). An overview of decoding techniques for large vocabulary continuous speech recognition. Computer Speech & Language, vol. 16, no. 1, pp. 89-114.
Bahl, J.L.R., de Souza, P.V., Gopalakrishnan, P.S., Nahamoo, D. and Picheny, M. (1994). Ro-bust methods for using context-dependent features and speech recognition models in a continuous speech recognizer. In Proceedings of ICASSP.
Balakrishnan, S. (2003). Fast incremental adaptation using maximum likelihood regression and stochastic gradient descent. In Proceedings of Eurospeech.
Bocchieri, E. (1993). Vector quantization for the efficient computation of continuous density likelihoods. In Proceedings of ICSLP, pp. 692-695.
Caseiro, D. and Trancose, I. (2006). A specialized on-the-fly algorithm for lexicon and lan-guage model composition. IEEE Transactions on Audio Speech and Language Processing, vol. 14, no. 4, pp. 1281-1291.
Deligne, S., Dharanipragada, S., Gopinath, R., Maison, B., Olsen, R. and Printz, H. (2002). A robust high accuracy speech recognition system for mobile applications. IEEE Transac-tions on Speech and Audio Processing, Special issue on automatic speech recognition for mobile and portable devices, 10 (8), pp. 551-561.
Dolfing, H.J.G.A. (2002). A comparison of prefix tree and finite-state transducer search space modelings for large vocabulary speech recognition. In Proceedings of ICSLP, pp. 1305-1308.
Frichtsch, J. and Rogina, I. (1996). The bucket box intersection (BBI) algorithm for fast ap-proximative evaluation of diagonal mixture gaussians. In Proceedings of ICASSP.
Gales, M.J.F. (1997). Maximum likelihood linear transformations from HMM-based speech recognition. CUED Technical Report TR291.
Gales, M.J.F., Knill, K.M. and Young, S.J. (1992). State-based Gaussian selection in large vocabulary continuous speech recognition using HMMs. IEEE Transactions on Speech and Audio Processing, vol. 7, no. 2, pp. 154-161.
Gopalakrishnan, P.S., Bahl, L.R. and Mercer, R.L. (1995). A tree search strategy for large vocabulary continuous speech recognition. In Proceedings of ICASSP, pp. 572-575.
Kanthak, S., Ney, H., Riley, M. and Mohri, M. (2000). A comparison of two LVR search optimization techniques. In Proceedings of ICSLP, pp. 1309-1312.
Mohri, M., Pereira, F. and Riley, M. (2002). Weighted finite-state transducers in speech recog-nition. Computer Speech & Language, vol. 16, no. 1, pp. 69-88.
Novak, M., and Picheny, M. (2000). Speed improvement of the tree-based time asynchronous search. In Proceedings of ICSLP, pp. 334-337.
Novak, M., Gopinath, R.A. and Sedivy, J. (2002). Efficient hierarchical labeler algorithm for Gaussian likelihoods computation in resource constrained speech recognition systems. http://www.research.ibm.com/people/r/rameshg/novak-icassp2002.ps.
Novak, M., Hampl, R., Krbec, P., Bergl, V. and Sedivy, J. (2003). Two-pass search strategy for large list recognition on embedded speech recognition platforms. In Proceedings of ICASSP, pp. 200-203.
Novak, M. (2005). Memory efficient approximative lattice generation for grammar based decoding. In Proceedings of Eurospeech, pp. 573-576.
Novak, M. and Bergl, V. (2004). Memory efficient decoding graph compilation with wide cross-word accoustic context. In Proceedings of ICSPL, pp. 281-284.
Olsen, P. and Dharanipragada, S. (2003). An efficient integrated gender detection scheme and time mediated averaging of gender dependent acoustic models. In Proceedings of Eurospeech, pp. 2509-2512.
Ortmanns, S., Firzlaff, T. and Ney, H. (1997a). Fast likelihood computation for continuous mixture densities in large vocabulary speech recognition. In Proceedings of Eurospeech, pp. 143-146.
Ortmanns, S., Ney, H., Eiden, S.A. and Coenen, N. (1997b). Look ahead techniques for fast beam search. In Proceedings of ICASSP, pp. 1783-1786.
Ortmanns, S., Eiden, S.A. and Ney, H. (1998). Improved lexical tree search for large vocabu-lary speech recognition. In Proceedings of ICASSP, pp. 817-820.
Ortmanns, S. and Ney, H. (2000). The time-conditioned approach in dynamic programming search for LVCSR. IEEE Transactions on Speech and Audio Processing, vol. 8, no. 6, pp. 676-687.
Renals, S. and Hochberg, M.M. (1999). Start-synchronous search for large vocabulary con-tinuous speech recognition. IEEE Transactions on Speech and Audio Processing, vol. 7, no. 5, pp. 542-553.
Saon, G., Zweig, G., Kingsbury, B., Mangu L. and Chaudhari, U. (2003). An architecture for rapid decoding of large vocabulary conversational speech. In Proceedings of Eurospeech, pp. 1977-1980.
Saon, G., Zweig, G. and Povey, D. (2005). Anatomy of an extremely fast LVCSR decoder. In Proceedings of Interspeech, pp. 549-552.
Schalkwyk, J., Hetherington, L. and Story, E. (2003). Speech recognition with dynamic gram-mars using finite-state transducers. In Proceedings of Eurospeech, pp. 1969-1972.
Schwartz, R. and Austin, S. (1993). A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses. In Proceedings of ICASSP.
Willet, D. and Katagiri, S. (2002). Recent advances in efficient decoding combining on-line transducer composition and smoothed language model incorporation. In Proceedings of ICASSP, pp. 713-716.
Zheng, J. and Franco, H. (2002). Fast hierarchical grammar optimization algorithm toward time and space efficiency. In Proceedings of ICSLP, pp. 393-396.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag London Limited
About this chapter
Cite this chapter
Novak, M. (2008). Algorithm Optimizations: Low Computational Complexity. In: Automatic Speech Recognition on Mobile Devices and over Communication Networks. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-143-5_10
Download citation
DOI: https://doi.org/10.1007/978-1-84800-143-5_10
Publisher Name: Springer, London
Print ISBN: 978-1-84800-142-8
Online ISBN: 978-1-84800-143-5
eBook Packages: Computer ScienceComputer Science (R0)