Speech recognition for mobile devices

  • Alexander Schmitt
  • Dmitry Zaykovskiy
  • Wolfgang Minker


This article presents an overview of different approaches for providing automatic speech recognition (ASR) technology to mobile users. Three principal system architectures with respect to the employment of a wireless communication link are analyzed: Embedded Speech Recognition Systems, Network Speech Recognition (NSR) and Distributed Speech Recognition (DSR). An overview of the solutions having been standardized so far as well as a critical analysis of the latest developments in the field of speech recognition in mobile environments is given. Open issues, pros and cons of the different methodologies and techniques are highlighted. Special emphasis is placed on the constraints and limitations ASR applications are confronted with under different architectures.


Embedded speech recognition systems Network speech recognition Distributed speech recognition Mobile devices 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 3GPP (2004). Recognition performance evaluations of codecs for speech enabled services (SES) (3GPP TR 26.943). Google Scholar
  2. Bocchieri, E. (2008). In Automatic speech recognition on mobile devices and over communication networks (advances in pattern recognition), Fixed-point arithmetic (pp. 255–274). Berlin: Springer. CrossRefGoogle Scholar
  3. ETSI (2002). Distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithm (ETSI Standard ES 202 050). Google Scholar
  4. Fingscheidt, T., & Vary, P. (2001). Softbit speech decoding: A new approach to error concealment. IEEE Transactions on Speech and Audio Processing, 9(3), 240–251. CrossRefGoogle Scholar
  5. Gartner (2009). Gartner says worldwide smartphone sales reached its lowest growth rate with 3.7 per cent increase in fourth quarter of 2008. Press release. Google Scholar
  6. Hacioglu, K., & Pellom, B. (2003). A distributed architecture for robust automatic speech recognition. In Proc. ICASSP (Vol. 1, pp. 328–331). Google Scholar
  7. Hagen, A., Pellom, B., & Connors, D. A. (2003). Analysis and design of architecture systems for speech recognition on modern handheld-computing devices. In Proc. of the 11th international symposium on hardware/software codesign. Google Scholar
  8. Hirsch, H.-G., & Pearce, D. (2000). The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In Proc. ISCA ITRW ASR2000 (pp. 181–188), Paris, France. Google Scholar
  9. Huerta, J. M. (2000). Speech recognition in mobile environments. PhD thesis, Carnegie Mellon University. Google Scholar
  10. Informa Telecoms & Media (2007). Super 3g mobile handsets set to top global market share by 2012. Press release. Google Scholar
  11. Ion, V., & Haeb-Umbach, R. (2005). A unified probabilistic approach to error concealment for distributed speech recognition. In Proc. interspeech 2005 ICSLP. Google Scholar
  12. James, A., & Milner, B. (2005). Soft decoding of temporal derivatives for robust distributed speech recognition in packet loss. In Proc. ICASSP (Vol. 1, pp. 345–348). Google Scholar
  13. Köhler, T. W., Fügen, C., Stüker, S., & Waibel, A. (2005). Rapid porting of ASR-systems to mobile devices. In Proc. of the 9th European conference on speech communication and technology (pp. 233–236). Google Scholar
  14. Market Intelligence Center (2008). Global mobile phone subscribers forecasted to reach 4.5 billion by 2012. Press Release. Google Scholar
  15. Novak, M. (2004). Towards large vocabulary ASR on embedded platforms. In Proc. interspeech 2004 ICSLP. Google Scholar
  16. Novak, M., Hampl, R., Krbec, P., Bergl, V., & Sedivy, J. (2003). Two-pass search strategy for large list recognition on embedded speech recognition platforms. In Proc. ICASSP (Vol. 1, pp. 200–203). Google Scholar
  17. Odell, J., Ollason, D., Woodland, P., Young, S., & Jansen, J. (1995). The HTK book for HTK V2.0. Cambridge: Cambridge University Press. Google Scholar
  18. Ortmanns, S., Firzlaff, T., & Ney, H. (1997). Fast likelihood computation methods for continuous mixture densities in large vocabulary speech recognition. In Proc. Eurospeech’97 (pp. 139–142), Rhodes, Greece. Google Scholar
  19. Paliwal, K. K., & So, S. (2004). Scalable distributed speech recognition using multi-frame GMM-based block quantization. In Proc. interspeech 2004 ICSLP. Google Scholar
  20. Peláez-Moreno, C., Gallardo-Antolín, A., & Díaz-de-María, F. (2001). Recognizing voice over IP: A robust front-end for speech recognition on the world wide web. IEEE Transactions on Multimedia 3(2). Google Scholar
  21. Pellom, B., & Hacioglu, K. (2001). Sonic: The university of Colorado continuous speech recognition system (Technical Report TR-CSLR-2001-01). University of Colorado. Google Scholar
  22. Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall. Google Scholar
  23. Raj, B., Migdal, J., & Singh, R. (2001). Distributed speech recognition with codec parameters. In Proc. ASRU’2001. Google Scholar
  24. Rose, R. C., & Partharathy, S. (2002). A tutorial on ASR for wireless mobile devices. In ICSLP. Google Scholar
  25. Rose, R., Arizmendi, I., & Parthasarathy, S. (2003). An efficient framework for robust mobile speech recognition services. In Proc. ICASSP (Vol. 1, pp. 316–319). Google Scholar
  26. Schmitt, A., Hank, C., & Liscombe, J. (2008). Detecting problematic calls with automated agents. In 4th IEEE tutorial and research workshop perception and interactive technologies for speech-based systems, Irsee, Germany. Google Scholar
  27. So, S., & Paliwal, K. K. (2004). Scalable distributed speech recognition using multi-frame gmm-based block quantization. In Proc. int. conf. spoken language processing, Jeju, Korea. Google Scholar
  28. Vasilache, M., Iso-Sipilä, J., & Viikki, O. (2004). On a practical design of a low complexity speech recognition engine. In Proc. ICASSP (Vol. 5, pp. 113–116). Google Scholar
  29. Walker, W., Lamere, P., Kwok, P., Raj, B., Singh, R., Gouvea, E., Wolf, P., & Woelfel, J. (2004). Sphinx-4: A flexible open source framework for speech recognition (Technical Report TR-2004-139). Sun Microsystems Laboratories. Google Scholar
  30. Zaykovskiy, D., & Schmitt, A. (2007). Java to micro edition front-end for distributed speech recognition systems. In The 2007 IEEE international symposium on ubiquitous computing and intelligence (UCI’07), Niagara Falls, Canada. Google Scholar
  31. Zaykovskiy, D., & Schmitt, A. (2008). Java vs. Symbian: A comparison of software-based DSR implementations on mobile phones. In 4th IET international conference on intelligent environments, Seattle, USA. Google Scholar
  32. Zaykovskiy, D., Schmitt, A., & Lutz, M. (2007). New use of mobile phones: Towards multimodal information access systems. In 3rd IET international conference on intelligent environments, Ulm, Germany. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Alexander Schmitt
    • 1
  • Dmitry Zaykovskiy
    • 1
  • Wolfgang Minker
    • 1
  1. 1.Institute of Information TechnologyUniversity of UlmUlm/DonauGermany

Personalised recommendations