Skip to main content

An Overview of Speech Recognition Systems

  • Chapter
  • First Online:
Cross-Word Modeling for Arabic Speech Recognition

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

Abstract

This chapter presents an introduction to automatic speech recognition systems. It includes the mathematical formulation of speech recognizers. The main components of speech recognition systems are introduced: Front-end signal processing, acoustic models, decoding, training, language model, and pronunciation dictionary. Additionally, a brief literature review of speech recognition systems is also provided. Viterbi and Baum–Welch algorithms are also discussed as the fundamental techniques for decoding and training phases, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Baker JK (1975) Stochastic modeling for automatic speech understanding. In: Reddy R (ed) Speech recognition. Academic, New York, pp 521–542

    Google Scholar 

  • Baker J, Deng L, Glass J, Khudanpur S, Lee C, Morgan N (2007) Historical development and future directions in speech recognition and understanding, MINDS report. http://www-nlpir.nist.gov/MINDS/FINAL/speech.web.pdf

  • Benzeghiba M, De Mori R et al (2007) Automatic speech recognition and speech variability: a review. Speech Commun 49(10–11):763–786

    Article  Google Scholar 

  • Beutler R (2007) Improving speech recognition through linguistic knowledge. Doctoral dissertation, ETH Zurich

    Google Scholar 

  • Bilmes J (2006) What HMMs can do. IEICE Trans Inf Syst E89-D(3):869–891

    Article  Google Scholar 

  • Cao G, Nie J-Y, Bai J (2005) Integrating word relationships into language models. In: Proceedings of the ACM 28th annual international conference on research and development in information retrieval (SIGIR’05), Salvador, Brazil

    Google Scholar 

  • Clarkson P, Rosenfeld R (1997) Statistical language modeling using the CMU-Cambridge toolkit. In: Proceedings of the 5th European conference on speech communication and technology, Rhodes, Greece

    Google Scholar 

  • CMU Sphinx Downloads (2011) http://cmusphinx.sourceforge.net/wiki/download. Accessed 1 Sep 2011

  • Deng L, Huang X (2004) Challenges in adopting speech recognition. Commun ACM 47(1):69–75

    Article  MathSciNet  Google Scholar 

  • Dong Y, Li D et al (2008) Robust speech recognition using a cepstral minimum-mean-square-error-motivated noise suppressor. IEEE Trans Audio Speech Lang Process 16(5):1061–1070

    Article  MathSciNet  Google Scholar 

  • Forney GD (1973) The Viterbi algorithm. Proc IEEE 61(3):268–278

    Article  MathSciNet  Google Scholar 

  • Gauvain J-L, Lee C-H (1994) Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans Speech Audio Process 2(2):291–298

    Article  Google Scholar 

  • Hong-Kwang Jeff K, Yuqing G (2006) Maximum entropy direct models for speech recognition. IEEE Trans Audio Speech Lang Process 14(3):873–881

    Article  Google Scholar 

  • HTK (2011) http://htk.eng.cam.ac.uk/. Accessed 1 Sep 2011

  • Huang XD (1992) Phoneme classification using semicontinuous hidden Markov models. IEEE Trans Signal Process 40(5):1062–1067

    Article  Google Scholar 

  • Huang X, Acero A, Hon H (2001) Spoken language processing. Prentice Hall PTR, Upper Saddle River, NJ

    Google Scholar 

  • Huang X, Acero A, Acero A, Hon H (2001) Spoken language processing: a guide to theory, algorithm, and system development. Prentice Hall, New York

    Google Scholar 

  • Hwang M-H (1993) Subphonetic acoustic modeling for speaker-independent continuous speech recognition, Ph.D. thesis, School of Computer Science, Carnegie Mellon University

    Google Scholar 

  • Hwang MY, Huang X (1993) Shared-distribution hidden Markov models for speech recognition. IEEE Trans Speech Audio Process 1(4):414–420

    Article  Google Scholar 

  • Jelinek F (1998) Statistical methods for speech recognition. MIT, Cambridge, MA

    Google Scholar 

  • Khasawneh M, Assaleh K et al (2004) The application of polynomial discriminant function classifiers to isolated Arabic speech recognition. In: Proceedings of the IEEE international joint conference on neural networks

    Google Scholar 

  • Lamere P, Kwok P, Walker W, Gouvea E, Singh R, Raj B, Wolf P (2003) Design of the CMU Sphinx-4 decoder. In: Proceedings of the 8th European conference on speech communication and technology, Geneva, Switzerland, pp 1181–1184

    Google Scholar 

  • Lee KF (1988) Large vocabulary speaker independent continuous speech recognition: the SPHINX system. Doctoral dissertation, Carnegie Mellon University

    Google Scholar 

  • Lee KF, Hon HW, Reddy R (1990) An overview of the SPHINX speech recognition system. IEEE Trans Acoust Speech Signal Process 38(1):35–45

    Article  Google Scholar 

  • Luo X (2011) Chinese speech recognition based on a hybrid SVM and HMM architecture advances in neural networks. In: Liu D, Zhang H, Polycarpou M, Alippi C, He H (eds) ISNN 2011, LNCS 6677. Springer, Berlin, pp 629–635

    Google Scholar 

  • Middag C, Martens J-P et al (2009) Automated intelligibility assessment of pathological speech using phonological features. EURASIP J Adv Signal Process 2009:1–9

    Article  Google Scholar 

  • Morgan N, Bourlard H (1995) Continuous speech recognition. IEEE Signal Process Mag 12(3):25–42

    Article  Google Scholar 

  • Open Source Toolkit for Speech Recognition (2011) http://cmusphinx.sourceforge.net/wiki/download/. Accessed 1 Sep 2011

  • Price P, Fisher WM, Bernstein J, Pallett DS (1988) The DARPA 1000-word resource management database for continuous speech recognition. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, vol 1, pp 651–654

    Google Scholar 

  • Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286

    Article  Google Scholar 

  • Rabiner L, Juang B (1993) Fundamentals of speech recognition. Prentice Hall, Upper Saddle River, NJ

    Google Scholar 

  • Rabiner LR, Juang BH (2004) Statistical methods for the recognition and understanding of speech. In: Encyclopedia of language and linguistics, Second Edition, 2005

    Google Scholar 

  • Salgado-Garza LR, Stern RM, Nolazco FJA. (2004). N-Best list rescoring using syntactic trigrams. In: Monroy R, Arroyo-Figueroa G, Sucar L, Sossa H (eds), MICAI 2004, LNAI 2972, Springer, Berlin, pp 79–88

    Google Scholar 

  • Schwenk H (2007) Continuous space language models. Comput Speech Lang 21(3):492–518

    Article  Google Scholar 

  • Singh R, Raj B et al (2002) Automatic generation of subword units for speech recognition systems. IEEE Trans Speech Audio Process 10(2):89–99

    Article  Google Scholar 

  • Sloin A, Burshtein D (2008) Support vector machine training for improved hidden Markov modeling. IEEE Trans Signal Process 56(1):172–188

    Article  MathSciNet  Google Scholar 

  • The CMU Pronunciation Dictionary (2011) http://www.speech.cs.cmu.edu/cgi-bin/cmudict. Accessed 1 Sep 2011

  • Xi X, Lin K, Zhou C, Cai J (2005) A new hybrid HMM/ANN model for speech recognition. In: Proceedings of the second IFIP conference on artificial intelligence applications and innovations (AIAI 2005), pp 223–230

    Google Scholar 

  • Xian T (2009) Hybrid Hidden Markov Model and artificial neural network for automatic speech recognition. Pacific-Asia conference on circuits, communications and systems, 2009. PACCS’09

    Google Scholar 

  • Xiao Y, Qin A (2010) Noise robust speech recognition based on improved hidden Markov model and wavelet neural network. Comput Eng Appl 46(22): pp 162–164, 235

    Google Scholar 

  • Ye-Yi W, Dong Y et al (2008) An introduction to voice search. IEEE Signal Process Mag 25(3):28–38

    Article  Google Scholar 

  • Young S (1996) A review of large-vocabulary continuous-speech recognition. IEEE Signal Process Mag 13(5):45–57

    Article  Google Scholar 

  • Young SJ, Evermann G, Gales MJF, Hain T, Kershaw D, Moore GL, Odell JJ, Ollason D, Povey D, Valtchev V, Woodland PC (2004) The HTK Book

    Google Scholar 

  • Yuecheng Z, Mnih A, Hinton G (2008) Improving a statistical language model by modulating the effects of context words, in: ESANN, 2008

    Google Scholar 

  • Zweig G, Nguyen P (2009) A segmental CRF approach to large vocabulary continuous speech recognition. IEEE workshop on automatic speech recognition and understanding, 2009. ASRU 2009

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dia AbuZeina .

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Dia AbuZeina

About this chapter

Cite this chapter

AbuZeina, D., Elshafei, M. (2012). An Overview of Speech Recognition Systems. In: Cross-Word Modeling for Arabic Speech Recognition. SpringerBriefs in Electrical and Computer Engineering(). Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-1213-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-1213-7_1

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4614-1212-0

  • Online ISBN: 978-1-4614-1213-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics