Skip to main content

Part of the book series: Advances in Pattern Recognition ((ACVPR))

  • 1194 Accesses

Advances in ASR are driven by both scientific achievements in the field and the availability of more powerful hardware. While very powerful CPUs allow us to use ever more complex algorithms in server-based large vocabulary ASR systems (e.g. in telephony applications), the capability of embedded platforms will always lag behind. Nevertheless as the popularity of ASR application grows, we can expect an increasing demand for functionality on embedded platforms as well. For example, replacing simple command and control grammar-based applications by natural language understanding (NLU) systems leads to increased vocabulary sizes and thus the need for greater CPU performance. In this chapter we present an overview of ASR decoder design options with an emphasis on techniques which are suitable for embedded platforms. One needs to keep in mind that there is no one-size-fits-all solution; specific algorithmic improvements may only be best applied to highly restricted applications or scenarios. The optimal solution can usually be achieved by making choices with respect to algorithms aimed at maximizing specific benefits for a particular platform and task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aubert, X.L. (2002). An overview of decoding techniques for large vocabulary continuous speech recognition. Computer Speech & Language, vol. 16, no. 1, pp. 89-114.

    Article  Google Scholar 

  • Bahl, J.L.R., de Souza, P.V., Gopalakrishnan, P.S., Nahamoo, D. and Picheny, M. (1994). Ro-bust methods for using context-dependent features and speech recognition models in a continuous speech recognizer. In Proceedings of ICASSP.

    Google Scholar 

  • Balakrishnan, S. (2003). Fast incremental adaptation using maximum likelihood regression and stochastic gradient descent. In Proceedings of Eurospeech.

    Google Scholar 

  • Bocchieri, E. (1993). Vector quantization for the efficient computation of continuous density likelihoods. In Proceedings of ICSLP, pp. 692-695.

    Google Scholar 

  • Caseiro, D. and Trancose, I. (2006). A specialized on-the-fly algorithm for lexicon and lan-guage model composition. IEEE Transactions on Audio Speech and Language Processing, vol. 14, no. 4, pp. 1281-1291.

    Article  Google Scholar 

  • Deligne, S., Dharanipragada, S., Gopinath, R., Maison, B., Olsen, R. and Printz, H. (2002). A robust high accuracy speech recognition system for mobile applications. IEEE Transac-tions on Speech and Audio Processing, Special issue on automatic speech recognition for mobile and portable devices, 10 (8), pp. 551-561.

    Google Scholar 

  • Dolfing, H.J.G.A. (2002). A comparison of prefix tree and finite-state transducer search space modelings for large vocabulary speech recognition. In Proceedings of ICSLP, pp. 1305-1308.

    Google Scholar 

  • Frichtsch, J. and Rogina, I. (1996). The bucket box intersection (BBI) algorithm for fast ap-proximative evaluation of diagonal mixture gaussians. In Proceedings of ICASSP.

    Google Scholar 

  • Gales, M.J.F. (1997). Maximum likelihood linear transformations from HMM-based speech recognition. CUED Technical Report TR291.

    Google Scholar 

  • Gales, M.J.F., Knill, K.M. and Young, S.J. (1992). State-based Gaussian selection in large vocabulary continuous speech recognition using HMMs. IEEE Transactions on Speech and Audio Processing, vol. 7, no. 2, pp. 154-161.

    Google Scholar 

  • Gopalakrishnan, P.S., Bahl, L.R. and Mercer, R.L. (1995). A tree search strategy for large vocabulary continuous speech recognition. In Proceedings of ICASSP, pp. 572-575.

    Google Scholar 

  • Kanthak, S., Ney, H., Riley, M. and Mohri, M. (2000). A comparison of two LVR search optimization techniques. In Proceedings of ICSLP, pp. 1309-1312.

    Google Scholar 

  • Mohri, M., Pereira, F. and Riley, M. (2002). Weighted finite-state transducers in speech recog-nition. Computer Speech & Language, vol. 16, no. 1, pp. 69-88.

    Article  Google Scholar 

  • Novak, M., and Picheny, M. (2000). Speed improvement of the tree-based time asynchronous search. In Proceedings of ICSLP, pp. 334-337.

    Google Scholar 

  • Novak, M., Gopinath, R.A. and Sedivy, J. (2002). Efficient hierarchical labeler algorithm for Gaussian likelihoods computation in resource constrained speech recognition systems. http://www.research.ibm.com/people/r/rameshg/novak-icassp2002.ps.

  • Novak, M., Hampl, R., Krbec, P., Bergl, V. and Sedivy, J. (2003). Two-pass search strategy for large list recognition on embedded speech recognition platforms. In Proceedings of ICASSP, pp. 200-203.

    Google Scholar 

  • Novak, M. (2005). Memory efficient approximative lattice generation for grammar based decoding. In Proceedings of Eurospeech, pp. 573-576.

    Google Scholar 

  • Novak, M. and Bergl, V. (2004). Memory efficient decoding graph compilation with wide cross-word accoustic context. In Proceedings of ICSPL, pp. 281-284.

    Google Scholar 

  • Olsen, P. and Dharanipragada, S. (2003). An efficient integrated gender detection scheme and time mediated averaging of gender dependent acoustic models. In Proceedings of Eurospeech, pp. 2509-2512.

    Google Scholar 

  • Ortmanns, S., Firzlaff, T. and Ney, H. (1997a). Fast likelihood computation for continuous mixture densities in large vocabulary speech recognition. In Proceedings of Eurospeech, pp. 143-146.

    Google Scholar 

  • Ortmanns, S., Ney, H., Eiden, S.A. and Coenen, N. (1997b). Look ahead techniques for fast beam search. In Proceedings of ICASSP, pp. 1783-1786.

    Google Scholar 

  • Ortmanns, S., Eiden, S.A. and Ney, H. (1998). Improved lexical tree search for large vocabu-lary speech recognition. In Proceedings of ICASSP, pp. 817-820.

    Google Scholar 

  • Ortmanns, S. and Ney, H. (2000). The time-conditioned approach in dynamic programming search for LVCSR. IEEE Transactions on Speech and Audio Processing, vol. 8, no. 6, pp. 676-687.

    Article  Google Scholar 

  • Renals, S. and Hochberg, M.M. (1999). Start-synchronous search for large vocabulary con-tinuous speech recognition. IEEE Transactions on Speech and Audio Processing, vol. 7, no. 5, pp. 542-553.

    Article  Google Scholar 

  • Saon, G., Zweig, G., Kingsbury, B., Mangu L. and Chaudhari, U. (2003). An architecture for rapid decoding of large vocabulary conversational speech. In Proceedings of Eurospeech, pp. 1977-1980.

    Google Scholar 

  • Saon, G., Zweig, G. and Povey, D. (2005). Anatomy of an extremely fast LVCSR decoder. In Proceedings of Interspeech, pp. 549-552.

    Google Scholar 

  • Schalkwyk, J., Hetherington, L. and Story, E. (2003). Speech recognition with dynamic gram-mars using finite-state transducers. In Proceedings of Eurospeech, pp. 1969-1972.

    Google Scholar 

  • Schwartz, R. and Austin, S. (1993). A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses. In Proceedings of ICASSP.

    Google Scholar 

  • Willet, D. and Katagiri, S. (2002). Recent advances in efficient decoding combining on-line transducer composition and smoothed language model incorporation. In Proceedings of ICASSP, pp. 713-716.

    Google Scholar 

  • Zheng, J. and Franco, H. (2002). Fast hierarchical grammar optimization algorithm toward time and space efficiency. In Proceedings of ICSLP, pp. 393-396.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag London Limited

About this chapter

Cite this chapter

Novak, M. (2008). Algorithm Optimizations: Low Computational Complexity. In: Automatic Speech Recognition on Mobile Devices and over Communication Networks. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-143-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-143-5_10

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-142-8

  • Online ISBN: 978-1-84800-143-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics