Abstract
Speech is the primary means of communication between humans. For reasons ranging from technological curiosity about the mechanisms for mechanical realization of human speech capabilities to the desire to automate simple tasks which necessitate human–machine interactions, research in automatic speech recognition by machines has attracted a great deal of attention for five decades.
Keywords
- Speech Recognition
- Automatic Speech Recognition
- Dynamic Time Warping
- Speech Recognition System
- Spontaneous Speech
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Allen, J. (2002). From Lord Rayleigh to Shannon: How do we decode speech? In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Orlando, FL, http://www.auditorymodels.org/jba/PAPERS/ICASSP/Plenary_Allen.asp.html.
ATIS Technical Reports (1995). Proc. ARPA Spoken Language Systems Technology Workshop, Austin, TX, 241–280.
Beek, B., Neuberg, E., Hodge, D. (1977). An assessment of the technology of automatic speech recognition for military applications. IEEE Trans. Acoust., Speech, Signal Process., 25, 310–322.
Bridle, J. S., Brown, M. D. (1979). Connected word recognition using whole word templates. In: Proc. Inst. Acoustics Autumn Conf., 25–28.
Chou, W. (2003). Minimum classification error (MCE) approach in pattern recognition. Chou, W., Juang, B.-H. (eds) Pattern Recognition in Speech and Language Processing. CRC Press, New York, 1–49.
Chow, Y. L., Dunham, M. O., Kimball, O. A. (1987). BYBLOS, the BBN continuous speech recognition system. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Dallas, TX, 89–92.
Davis, K. H., Biddulph, R., Balashek, S. (1952). Automatic recognition of spoken digits. J. Acoust. Soc. Am., 24 (6), 637–642.
Ferguson, J. (ed) (1980). Hidden Markov Models for Speech. IDA, Princeton, NJ.
Forgie, J. W., Forgie, C. D. (1959). Results obtained from a vowel recognition computer program. J. Acoust. Soc. Am., 31 (11), 1480–1489.
Fry, D. B., Denes, P. (1959). Theoretical aspects of mechanical speech recognition. The design and operation of the mechanical speech recognizer at University College London. J. British Inst. Radio Eng., 19 (4), 211–229.
Furui, S. (1986). Speaker independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. Acoust., Speech, Signal Process., 34, 52–59.
Furui, S. (2004). Speech-to-text and speech-to-speech summarization of spontaneous speech. IEEE Trans. Speech Audio Process., 12, 401–408.
Furui, S. (2004). Fifty years of progress in speech and speaker recognition. In: Proc. 148th Acoustical Society of America Meeting, San Diego, CA, 2497.
Furui, S. (2005). Recent progress in corpus-based spontaneous speech recognition. IEICE Trans. Inf. Syst., E88-D (3), 366–375.
Gales, M. J. F., Young, S. J. (1993). Parallel model combination for speech recognition in noise. Technical Report, CUED/F-INFENG/TR135.
Itakura, F. (1975). Minimum prediction residual applied to speech recognition. IEEE Trans. Acoust., Speech, Signal Process., 23, 67–72.
Jelinek, F. (1985). The development of an experimental discrete dictation recognizer. Proc. IEEE, 73 (11), 1616–1624.
Jelinek, F., Bahl, L., Mercer, R. (1975). Design of a linguistic statistical decoder for the recognition of continuous speech. IEEE Trans. Inf. Theory, 21, 250–256.
Juang, B. H., Furui, S. (2000). Automatic speech recognition and understanding: A first step toward natural human-machine communication. Proc. IEEE, 88 (8), 1142–1165.
Juang, B. H., Rabiner, L. R. (2005). Automatic speech recognition: History. Brown, K. (ed) Encyclopedia of Language and Linguistics, Second Edition, Oxford: Elsevier, New York, 11, 806–819.
Junqua, J. C., Haton, J. P. (1996). Robustness in Automatic Speech Recognition. Kluwer, Boston.
Katagiri, S. (2003). Speech pattern recognition using neural networks. Chou, W., Juang, B. H. (eds) Pattern Recognition in Speech and Language Processing. CRC Press, New York, 115–147.
Kawahara, T., Lee, C. H., Juang, B. H. (1998). Key-phrase detection and verification for flexible speech understanding. IEEE Trans. Speech Audio Process, 6, 558–568.
Klatt, D. (1977). Review of the ARPA speech understanding project. J. Acoust. Soc. Am., 62 (6), 1324–1366.
Koo, M. W., Lee, C. H., Juang, B. H. (2001). Speech recognition and utterance verification based on a generalized confidence score. IEEE Trans. Speech Audio Process, 9, 821–832.
Lee, C. H., Giachin, E., Rabiner, L. R., Pieraccini, R., Rosenberg, A. E. (1990). Acoustic modeling for large vocabulary speech recognition. Comput. Speech Lang., 4, 127–165.
Lee, C. H., Rabiner, L. R. (1989). A frame synchronous network search algorithm for connected word recognition. IEEE Trans. Acoust., Speech, Signal Process, 37, 1649–1658.
Lee, K. F., Hon, H., Reddy, R. (1990). An overview of the SPHINX speech recognition system. IEEE Trans. Acoust., Speech, Signal Process, 38, 600–610.
Leggetter, C. J., Woodland, P. C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang., 9, 171–185.
Lippmann, R. P. (1987). An introduction to computing with neural nets. IEEE ASSP Mag., 4 (2), 4–22.
Lippmann, R. P. (1997). Speech recognition by machines and humans. Speech Communication, 22, 1–15.
Liu, Y., Shriberg, E., Stolcke, A., Peskin, B., Ang, J., Hillard, D., Ostendorf, M., Tomalin, M., Woodland, P. C., Harper, M. (2005). Structural metadata research in the EARS program. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Montreal, Canada, V, 957–960.
Lowerre, B. (1980). The HARPY speech understanding system. Lea, W (ed) Trends in Speech Recognition. Prentice Hall, NJ, 576–586.
Martin, T. B., Nelson, A. L., Zadell, H. J. (1964). Speech recognition by feature abstraction techniques. Technical Report AL-TDR-64-176, Air Force Avionics Lab.
Moore, R. C. (1997). Using natural-language knowledge sources in speech recognition. Ponting, K. (ed) Computational Models of Speech Pattern Processing. Springer, Berlin, 304–327.
Myers, C. S., Rabiner, L. R. (1981). A level building dynamic time warping algorithm for connected word recognition. IEEE Trans. Acoust., Speech, Signal Process., 29, 284–297.
Nagata, K., Kato, Y., Chiba, S. (1963). Spoken digit recognizer for Japanese language. NEC Res. Develop., 6.
Olson, H. F., Belar, H. (1956). Phonetic typewriter. J. Acoust. Soc. Am., 28 (6), 1072–1081.
Paul, D. B. (1989). The Lincoln robust continuous speech recognizer. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Glasgow, Scotland, 449–452.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77 (2), 257–286.
Rabiner, L. R., Juang, B. H. (1993). Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliff, NJ.
Rabiner, L. R., Levinson, S. E., Rosenberg, A. E. (1979). Speaker independent recognition of isolated words using clustering techniques. IEEE Trans. Acoust., Speech, Signal Process., 27, 336–349.
Reddy, D. R. (1966). An approach to computer speech recognition by direct analysis of the speech wave. Technical Report No. C549, Computer Science Department, Stanford University, Stanford.
Sakai, T., Doshita, S. (1962). The phonetic typewriter, information processing. In: Proc. IFIP Congress, Munich.
Sakoe, H. (1979). Two level DP matching – a dynamic programming based pattern matching algorithm for connected word recognition. IEEE Trans. Acoust., Speech, Signal Process., 27, 588–595.
Sakoe, H., Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust., Speech, Signal Process., 26, 43–49.
Shinoda, K., Lee, C. H. (2001). A structural Bayes approach to speaker adaptation. IEEE Trans. Speech Audio Process., 9, 276–287.
Soltau, H., Kingsbury, B., Mangu, L., Povey, D., Saon, G., Zweig, G. (2005). The IBM 2004 conversational telephone system for rich transcription. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Montreal, Canada, I, 205–208.
Suzuki, J., Nakata, K. (1961). Recognition of Japanese vowels – preliminary to the recognition of speech. J. Radio Res. Lab., 37 (8), 193–212.
Tappert, C., Dixon, N. R., Rabinowitz, A. S., Chapman, W. D. (1971). Automatic recognition of continuous speech utilizing dynamic segmentation, dual classification, sequential decoding and error recovery. Rome Air Dev. Cen, Rome, NY, Technical Report TR 71–146.
Varga, P., Moore, R. K. (1990). Hidden Markov model decomposition of speech and noise. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Albuquerque, New Mexico, 845–848.
Velichko, V. M., Zagoruyko, N. G. (1970). Automatic recognition of 200 words. Int. J. Man-Machine Studies, 2, 223–234.
Vintsyuk, T. K. (1968). Speech discrimination by dynamic programming. Kibernetika, 4 (2), 81–88.
Viterbi, J. (1967). Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. Inf. Theory, 13, 260–269.
Waibel, A., Hanazawa, T., Hinton, G., Shiano, K., Lang, K. (1989). Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust., Speech, Signal Process., 37, 393–404.
Weintraub, M., Murveit, H., Cohen, M., Price, P., Bernstein, J., Bell, G. (1989). Linguistic constraints in hidden Markov model based speech recognition. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Glasgow, Scotland, 699–702.
Zue, V., Glass, J., Phillips, M., Seneff, S. (1989). The MIT summit speech recognition system, a progress report. In: Proc. DARPA Speech and Natural Language Workshop, Philadelphia, PA, 179–189.
Zweig, G. (1998). Speech recognition with dynamic Bayesian networks. Ph.D. Thesis, University of California, Berkeley.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Furui, S. (2010). History and Development of Speech Recognition. In: Chen, F., Jokinen, K. (eds) Speech Technology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-73819-2_1
Download citation
DOI: https://doi.org/10.1007/978-0-387-73819-2_1
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-73818-5
Online ISBN: 978-0-387-73819-2
eBook Packages: EngineeringEngineering (R0)