Skip to main content

Automatic Speech Recognition

  • Chapter
  • First Online:
Deep Learning for NLP and Speech Recognition

Abstract

Automatic speech recognition (ASR) has grown tremendously in recent years, with deep learning playing a key role. Simply put, ASR is the task of converting spoken language into computer readable text (Fig. 8.1). It has quickly become ubiquitous today as a useful way to interact with technology, significantly bridging in the gap in human–computer interaction, making it more natural.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://voice.mozilla.org/en/data.

  2. 2.

    http://sox.sourceforge.net/.

  3. 3.

    http://www.speech.cs.cmu.edu/tools/lextool.html.

  4. 4.

    If there are additional labels like speaker and gender, these can also be used in the process. Common Voice does not have these labels, so each utterance is treated independently.

  5. 5.

    Note: It is possible to add specific words to the lexicon by exiting the lexicon-iv.txt file.

References

  1. Herve A Bourlard and Nelson Morgan. Connectionist speech recognition: a hybrid approach. Vol. 247. Springer Science & Business Media, 2012.

    Google Scholar 

  2. Michael Brandstein and Darren Ward. Microphone arrays: signal processing techniques and applications. Springer Science & Business Media, 2013.

    Google Scholar 

  3. Steven B Davis and Paul Mermelstein. “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”. In: Readings in speech recognition. Elsevier, 1990, pp. 65–74.

    Google Scholar 

  4. Hynek Hermansky. “Perceptual linear predictive (PLP) analysis of speech”. In: the Journal of the Acoustical Society of America 87.4 (1990), pp. 1738–1752.

    Article  Google Scholar 

  5. Yedid Hoshen, Ron J Weiss, and Kevin W Wilson. “Speech acoustic modeling from raw multichannel waveforms”. In: Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE. 2015, pp. 4624–4628.

    Google Scholar 

  6. Navdeep Jaitly and Geoffrey Hinton. “Learning a better representation of speech soundwaves using restricted Boltzmann machines”. In: Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. IEEE. 2011, pp. 5884–5887.

    Google Scholar 

  7. Mehryar Mohri, Fernando Pereira, and Michael Riley. “Speech recognition with weighted finite-state transducers”. In: Springer Handbook of Speech Processing. Springer, 2008, pp. 559–584.

    Google Scholar 

  8. Andrew Cameron Morris, Viktoria Maier, and Phil Green. “From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition”. In: Eighth International Conference on Spoken Language Processing. 2004.

    Google Scholar 

  9. Lindasalwa Muda, Mumtaj Begam, and Irraivan Elamvazuthi. “Voice recognition algorithms using Mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques”. In: arXiv preprint arXiv:1003.4083 (2010).

    Google Scholar 

  10. Dimitri Palaz, Ronan Collobert, and Mathew Magimai Doss. “Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks”. In: arXiv preprint arXiv:1304.1018 (2013).

    Google Scholar 

  11. Venkata Neelima Parinam, Chandra Sekhar Vootkuri, and Stephen A Zahorian. “Comparison of spectral analysis methods for automatic speech recognition.” In: INTERSPEECH. 2013, pp. 3356–3360.

    Google Scholar 

  12. Vijayaditya Peddinti, Daniel Povey, and Sanjeev Khudanpur. “A time delay neural network architecture for efficient modeling of long temporal contexts”. In: Sixteenth Annual Conference of the International Speech Communication Association. 2015.

    Google Scholar 

  13. Lawrence R Rabiner. “A tutorial on hidden Markov models and selected applications in speech recognition”. In: Proceedings of the IEEE 77.2 (1989), pp. 257–286.

    Article  Google Scholar 

  14. Shakti P Rath et al. “Improved feature processing for deep neural networks.” In: Interspeech. 2013, pp. 109–113.

    Google Scholar 

  15. Ralf Schluter et al. “Gammatone features and feature combination for large vocabulary speech recognition”. In: Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on. Vol. 4. IEEE. 2007, pp. IV–649.

    Google Scholar 

  16. Zoltán Tüske et al. “Acoustic modeling with deep neural networks using raw time signal for LVCSR”. In: Fifteenth Annual Conference of the International Speech Communication Association. 2014.

    Google Scholar 

  17. Steve Young. “A review of large-vocabulary continuous-speech”. In: IEEE signal processing magazine 13.5 (1996), p. 45.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kamath, U., Liu, J., Whitaker, J. (2019). Automatic Speech Recognition. In: Deep Learning for NLP and Speech Recognition . Springer, Cham. https://doi.org/10.1007/978-3-030-14596-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-14596-5_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-14595-8

  • Online ISBN: 978-3-030-14596-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics