Abstract
This chapter presents an introduction to automatic speech recognition systems. It includes the mathematical formulation of speech recognizers. The main components of speech recognition systems are introduced: Front-end signal processing, acoustic models, decoding, training, language model, and pronunciation dictionary. Additionally, a brief literature review of speech recognition systems is also provided. Viterbi and Baum–Welch algorithms are also discussed as the fundamental techniques for decoding and training phases, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baker JK (1975) Stochastic modeling for automatic speech understanding. In: Reddy R (ed) Speech recognition. Academic, New York, pp 521–542
Baker J, Deng L, Glass J, Khudanpur S, Lee C, Morgan N (2007) Historical development and future directions in speech recognition and understanding, MINDS report. http://www-nlpir.nist.gov/MINDS/FINAL/speech.web.pdf
Benzeghiba M, De Mori R et al (2007) Automatic speech recognition and speech variability: a review. Speech Commun 49(10–11):763–786
Beutler R (2007) Improving speech recognition through linguistic knowledge. Doctoral dissertation, ETH Zurich
Bilmes J (2006) What HMMs can do. IEICE Trans Inf Syst E89-D(3):869–891
Cao G, Nie J-Y, Bai J (2005) Integrating word relationships into language models. In: Proceedings of the ACM 28th annual international conference on research and development in information retrieval (SIGIR’05), Salvador, Brazil
Clarkson P, Rosenfeld R (1997) Statistical language modeling using the CMU-Cambridge toolkit. In: Proceedings of the 5th European conference on speech communication and technology, Rhodes, Greece
CMU Sphinx Downloads (2011) http://cmusphinx.sourceforge.net/wiki/download. Accessed 1 Sep 2011
Deng L, Huang X (2004) Challenges in adopting speech recognition. Commun ACM 47(1):69–75
Dong Y, Li D et al (2008) Robust speech recognition using a cepstral minimum-mean-square-error-motivated noise suppressor. IEEE Trans Audio Speech Lang Process 16(5):1061–1070
Forney GD (1973) The Viterbi algorithm. Proc IEEE 61(3):268–278
Gauvain J-L, Lee C-H (1994) Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans Speech Audio Process 2(2):291–298
Hong-Kwang Jeff K, Yuqing G (2006) Maximum entropy direct models for speech recognition. IEEE Trans Audio Speech Lang Process 14(3):873–881
HTK (2011) http://htk.eng.cam.ac.uk/. Accessed 1 Sep 2011
Huang XD (1992) Phoneme classification using semicontinuous hidden Markov models. IEEE Trans Signal Process 40(5):1062–1067
Huang X, Acero A, Hon H (2001) Spoken language processing. Prentice Hall PTR, Upper Saddle River, NJ
Huang X, Acero A, Acero A, Hon H (2001) Spoken language processing: a guide to theory, algorithm, and system development. Prentice Hall, New York
Hwang M-H (1993) Subphonetic acoustic modeling for speaker-independent continuous speech recognition, Ph.D. thesis, School of Computer Science, Carnegie Mellon University
Hwang MY, Huang X (1993) Shared-distribution hidden Markov models for speech recognition. IEEE Trans Speech Audio Process 1(4):414–420
Jelinek F (1998) Statistical methods for speech recognition. MIT, Cambridge, MA
Khasawneh M, Assaleh K et al (2004) The application of polynomial discriminant function classifiers to isolated Arabic speech recognition. In: Proceedings of the IEEE international joint conference on neural networks
Lamere P, Kwok P, Walker W, Gouvea E, Singh R, Raj B, Wolf P (2003) Design of the CMU Sphinx-4 decoder. In: Proceedings of the 8th European conference on speech communication and technology, Geneva, Switzerland, pp 1181–1184
Lee KF (1988) Large vocabulary speaker independent continuous speech recognition: the SPHINX system. Doctoral dissertation, Carnegie Mellon University
Lee KF, Hon HW, Reddy R (1990) An overview of the SPHINX speech recognition system. IEEE Trans Acoust Speech Signal Process 38(1):35–45
Luo X (2011) Chinese speech recognition based on a hybrid SVM and HMM architecture advances in neural networks. In: Liu D, Zhang H, Polycarpou M, Alippi C, He H (eds) ISNN 2011, LNCS 6677. Springer, Berlin, pp 629–635
Middag C, Martens J-P et al (2009) Automated intelligibility assessment of pathological speech using phonological features. EURASIP J Adv Signal Process 2009:1–9
Morgan N, Bourlard H (1995) Continuous speech recognition. IEEE Signal Process Mag 12(3):25–42
Open Source Toolkit for Speech Recognition (2011) http://cmusphinx.sourceforge.net/wiki/download/. Accessed 1 Sep 2011
Price P, Fisher WM, Bernstein J, Pallett DS (1988) The DARPA 1000-word resource management database for continuous speech recognition. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, vol 1, pp 651–654
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Rabiner L, Juang B (1993) Fundamentals of speech recognition. Prentice Hall, Upper Saddle River, NJ
Rabiner LR, Juang BH (2004) Statistical methods for the recognition and understanding of speech. In: Encyclopedia of language and linguistics, Second Edition, 2005
Salgado-Garza LR, Stern RM, Nolazco FJA. (2004). N-Best list rescoring using syntactic trigrams. In: Monroy R, Arroyo-Figueroa G, Sucar L, Sossa H (eds), MICAI 2004, LNAI 2972, Springer, Berlin, pp 79–88
Schwenk H (2007) Continuous space language models. Comput Speech Lang 21(3):492–518
Singh R, Raj B et al (2002) Automatic generation of subword units for speech recognition systems. IEEE Trans Speech Audio Process 10(2):89–99
Sloin A, Burshtein D (2008) Support vector machine training for improved hidden Markov modeling. IEEE Trans Signal Process 56(1):172–188
The CMU Pronunciation Dictionary (2011) http://www.speech.cs.cmu.edu/cgi-bin/cmudict. Accessed 1 Sep 2011
Xi X, Lin K, Zhou C, Cai J (2005) A new hybrid HMM/ANN model for speech recognition. In: Proceedings of the second IFIP conference on artificial intelligence applications and innovations (AIAI 2005), pp 223–230
Xian T (2009) Hybrid Hidden Markov Model and artificial neural network for automatic speech recognition. Pacific-Asia conference on circuits, communications and systems, 2009. PACCS’09
Xiao Y, Qin A (2010) Noise robust speech recognition based on improved hidden Markov model and wavelet neural network. Comput Eng Appl 46(22): pp 162–164, 235
Ye-Yi W, Dong Y et al (2008) An introduction to voice search. IEEE Signal Process Mag 25(3):28–38
Young S (1996) A review of large-vocabulary continuous-speech recognition. IEEE Signal Process Mag 13(5):45–57
Young SJ, Evermann G, Gales MJF, Hain T, Kershaw D, Moore GL, Odell JJ, Ollason D, Povey D, Valtchev V, Woodland PC (2004) The HTK Book
Yuecheng Z, Mnih A, Hinton G (2008) Improving a statistical language model by modulating the effects of context words, in: ESANN, 2008
Zweig G, Nguyen P (2009) A segmental CRF approach to large vocabulary continuous speech recognition. IEEE workshop on automatic speech recognition and understanding, 2009. ASRU 2009
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2012 Dia AbuZeina
About this chapter
Cite this chapter
AbuZeina, D., Elshafei, M. (2012). An Overview of Speech Recognition Systems. In: Cross-Word Modeling for Arabic Speech Recognition. SpringerBriefs in Electrical and Computer Engineering(). Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-1213-7_1
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1213-7_1
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4614-1212-0
Online ISBN: 978-1-4614-1213-7
eBook Packages: EngineeringEngineering (R0)