An Overview of Speech Recognition Systems

AbuZeina, Dia; Elshafei, Moustafa

doi:10.1007/978-1-4614-1213-7_1

Dia AbuZeina³ &
Moustafa Elshafei³

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

622 Accesses
1 Citations

Abstract

This chapter presents an introduction to automatic speech recognition systems. It includes the mathematical formulation of speech recognizers. The main components of speech recognition systems are introduced: Front-end signal processing, acoustic models, decoding, training, language model, and pronunciation dictionary. Additionally, a brief literature review of speech recognition systems is also provided. Viterbi and Baum–Welch algorithms are also discussed as the fundamental techniques for decoding and training phases, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baker JK (1975) Stochastic modeling for automatic speech understanding. In: Reddy R (ed) Speech recognition. Academic, New York, pp 521–542
Google Scholar
Baker J, Deng L, Glass J, Khudanpur S, Lee C, Morgan N (2007) Historical development and future directions in speech recognition and understanding, MINDS report. http://www-nlpir.nist.gov/MINDS/FINAL/speech.web.pdf
Benzeghiba M, De Mori R et al (2007) Automatic speech recognition and speech variability: a review. Speech Commun 49(10–11):763–786
Article Google Scholar
Beutler R (2007) Improving speech recognition through linguistic knowledge. Doctoral dissertation, ETH Zurich
Google Scholar
Bilmes J (2006) What HMMs can do. IEICE Trans Inf Syst E89-D(3):869–891
Article Google Scholar
Cao G, Nie J-Y, Bai J (2005) Integrating word relationships into language models. In: Proceedings of the ACM 28th annual international conference on research and development in information retrieval (SIGIR’05), Salvador, Brazil
Google Scholar
Clarkson P, Rosenfeld R (1997) Statistical language modeling using the CMU-Cambridge toolkit. In: Proceedings of the 5th European conference on speech communication and technology, Rhodes, Greece
Google Scholar
CMU Sphinx Downloads (2011) http://cmusphinx.sourceforge.net/wiki/download. Accessed 1 Sep 2011
Deng L, Huang X (2004) Challenges in adopting speech recognition. Commun ACM 47(1):69–75
Article MathSciNet Google Scholar
Dong Y, Li D et al (2008) Robust speech recognition using a cepstral minimum-mean-square-error-motivated noise suppressor. IEEE Trans Audio Speech Lang Process 16(5):1061–1070
Article MathSciNet Google Scholar
Forney GD (1973) The Viterbi algorithm. Proc IEEE 61(3):268–278
Article MathSciNet Google Scholar
Gauvain J-L, Lee C-H (1994) Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans Speech Audio Process 2(2):291–298
Article Google Scholar
Hong-Kwang Jeff K, Yuqing G (2006) Maximum entropy direct models for speech recognition. IEEE Trans Audio Speech Lang Process 14(3):873–881
Article Google Scholar
HTK (2011) http://htk.eng.cam.ac.uk/. Accessed 1 Sep 2011
Huang XD (1992) Phoneme classification using semicontinuous hidden Markov models. IEEE Trans Signal Process 40(5):1062–1067
Article Google Scholar
Huang X, Acero A, Hon H (2001) Spoken language processing. Prentice Hall PTR, Upper Saddle River, NJ
Google Scholar
Huang X, Acero A, Acero A, Hon H (2001) Spoken language processing: a guide to theory, algorithm, and system development. Prentice Hall, New York
Google Scholar
Hwang M-H (1993) Subphonetic acoustic modeling for speaker-independent continuous speech recognition, Ph.D. thesis, School of Computer Science, Carnegie Mellon University
Google Scholar
Hwang MY, Huang X (1993) Shared-distribution hidden Markov models for speech recognition. IEEE Trans Speech Audio Process 1(4):414–420
Article Google Scholar
Jelinek F (1998) Statistical methods for speech recognition. MIT, Cambridge, MA
Google Scholar
Khasawneh M, Assaleh K et al (2004) The application of polynomial discriminant function classifiers to isolated Arabic speech recognition. In: Proceedings of the IEEE international joint conference on neural networks
Google Scholar
Lamere P, Kwok P, Walker W, Gouvea E, Singh R, Raj B, Wolf P (2003) Design of the CMU Sphinx-4 decoder. In: Proceedings of the 8th European conference on speech communication and technology, Geneva, Switzerland, pp 1181–1184
Google Scholar
Lee KF (1988) Large vocabulary speaker independent continuous speech recognition: the SPHINX system. Doctoral dissertation, Carnegie Mellon University
Google Scholar
Lee KF, Hon HW, Reddy R (1990) An overview of the SPHINX speech recognition system. IEEE Trans Acoust Speech Signal Process 38(1):35–45
Article Google Scholar
Luo X (2011) Chinese speech recognition based on a hybrid SVM and HMM architecture advances in neural networks. In: Liu D, Zhang H, Polycarpou M, Alippi C, He H (eds) ISNN 2011, LNCS 6677. Springer, Berlin, pp 629–635
Google Scholar
Middag C, Martens J-P et al (2009) Automated intelligibility assessment of pathological speech using phonological features. EURASIP J Adv Signal Process 2009:1–9
Article Google Scholar
Morgan N, Bourlard H (1995) Continuous speech recognition. IEEE Signal Process Mag 12(3):25–42
Article Google Scholar
Open Source Toolkit for Speech Recognition (2011) http://cmusphinx.sourceforge.net/wiki/download/. Accessed 1 Sep 2011
Price P, Fisher WM, Bernstein J, Pallett DS (1988) The DARPA 1000-word resource management database for continuous speech recognition. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, vol 1, pp 651–654
Google Scholar
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Article Google Scholar
Rabiner L, Juang B (1993) Fundamentals of speech recognition. Prentice Hall, Upper Saddle River, NJ
Google Scholar
Rabiner LR, Juang BH (2004) Statistical methods for the recognition and understanding of speech. In: Encyclopedia of language and linguistics, Second Edition, 2005
Google Scholar
Salgado-Garza LR, Stern RM, Nolazco FJA. (2004). N-Best list rescoring using syntactic trigrams. In: Monroy R, Arroyo-Figueroa G, Sucar L, Sossa H (eds), MICAI 2004, LNAI 2972, Springer, Berlin, pp 79–88
Google Scholar
Schwenk H (2007) Continuous space language models. Comput Speech Lang 21(3):492–518
Article Google Scholar
Singh R, Raj B et al (2002) Automatic generation of subword units for speech recognition systems. IEEE Trans Speech Audio Process 10(2):89–99
Article Google Scholar
Sloin A, Burshtein D (2008) Support vector machine training for improved hidden Markov modeling. IEEE Trans Signal Process 56(1):172–188
Article MathSciNet Google Scholar
The CMU Pronunciation Dictionary (2011) http://www.speech.cs.cmu.edu/cgi-bin/cmudict. Accessed 1 Sep 2011
Xi X, Lin K, Zhou C, Cai J (2005) A new hybrid HMM/ANN model for speech recognition. In: Proceedings of the second IFIP conference on artificial intelligence applications and innovations (AIAI 2005), pp 223–230
Google Scholar
Xian T (2009) Hybrid Hidden Markov Model and artificial neural network for automatic speech recognition. Pacific-Asia conference on circuits, communications and systems, 2009. PACCS’09
Google Scholar
Xiao Y, Qin A (2010) Noise robust speech recognition based on improved hidden Markov model and wavelet neural network. Comput Eng Appl 46(22): pp 162–164, 235
Google Scholar
Ye-Yi W, Dong Y et al (2008) An introduction to voice search. IEEE Signal Process Mag 25(3):28–38
Article Google Scholar
Young S (1996) A review of large-vocabulary continuous-speech recognition. IEEE Signal Process Mag 13(5):45–57
Article Google Scholar
Young SJ, Evermann G, Gales MJF, Hain T, Kershaw D, Moore GL, Odell JJ, Ollason D, Povey D, Valtchev V, Woodland PC (2004) The HTK Book
Google Scholar
Yuecheng Z, Mnih A, Hinton G (2008) Improving a statistical language model by modulating the effects of context words, in: ESANN, 2008
Google Scholar
Zweig G, Nguyen P (2009) A segmental CRF approach to large vocabulary continuous speech recognition. IEEE workshop on automatic speech recognition and understanding, 2009. ASRU 2009
Google Scholar

Download references

Author information

Authors and Affiliations

King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
Dia AbuZeina & Moustafa Elshafei

Authors

Dia AbuZeina
View author publications
You can also search for this author in PubMed Google Scholar
Moustafa Elshafei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dia AbuZeina .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

AbuZeina, D., Elshafei, M. (2012). An Overview of Speech Recognition Systems. In: Cross-Word Modeling for Arabic Speech Recognition. SpringerBriefs in Electrical and Computer Engineering(). Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-1213-7_1

Download citation

DOI: https://doi.org/10.1007/978-1-4614-1213-7_1
Published: 15 November 2011
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4614-1212-0
Online ISBN: 978-1-4614-1213-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics