Abstract
Although speech, derived from reading texts, and similar types of speech, e.g. that from reading newspapers or that from news broadcast, can be recognized with high accuracy, recognition accuracy drastically decreases for spontaneous speech. This is due to the fact that spontaneous speech and read speech are significantly different acoustically as well as linguistically. This paper reports analysis and recognition of spontaneous speech using a large-scale spontaneous speech database “Corpus of Spontaneous Japanese (CSJ)”. Recognition results in this experiment show that recognition accuracy significantly increases as a function of the size of acoustic as well as language model training data and the improvement levels off at approximately 7M words of training data. This means that acoustic and linguistic variation of spontaneous speech is so large that we need a very large corpus in order to encompass the variations. Spectral analysis using various styles of utterances in the CSJ shows that the spectral distribution/difference of phonemes is significantly reduced in spontaneous speech compared to read speech. Experimental results also show that there is a strong correlation between mean spectral distance between phonemes and phoneme recognition accuracy. This indicates that spectral reduction is one major reason for the decrease of recognition accuracy of spontaneous speech.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Furui, S.: Recent advances in spontaneous speech recognition and understanding. In: Proc. IEEE Workshop on SSPR, Tokyo, pp. 1–6 (2003)
Furui, S.: Toward spontaneous speech recognition and understanding. In: Chou, W., Juang, B.-H. (eds.) Pattern Recognition in Speech and Language Processing, pp. 191–227. CRC Press, New York (2003)
Shinozaki, T., Hori, C., Furui, S.: Towards automatic transcription of spontaneous presentations. In: Proc. Eurospeech, Aalborg, Denmark, pp. 491–494 (2001)
Sankar, A., Gadde, V.R.R., Stolcke, A., Weng, F.: Improved modeling and efficiency for automatic transcription of broadcast news. Speech Communication 37, 133–158 (2002)
Gauvain, J.-L., Lamel, L.: Large vocabulary speech recognition based on statistical methods. In: Chou, W., Juang, B.-H. (eds.) Pattern Recognition in Speech and Language Processing, pp. 149–189. CRC Press, New York (2003)
Evermann, G., et al.: Development of the, CU-HTK conversational telephone speech transcription system. In: Proc. IEEE ICASSP, Montreal, pp. I-249–252 (2003)
Schwartz, R., et al.: Speech recognition in multiple languages and domains: the, BBN/LIMSI EARS system. In: Proc. IEEE ICASSP, Montreal, pp. III-753–756 (2003)
van Son, R.J.J.H., Pols, L.C.W.: An acoustic description of consonant reduction. Speech Communication 28(2), 125–140 (1999)
Duez, D.: On spontaneous French speech: aspects of the reduction and contextual assimilation of voiced stops. J. Phonetics 23, 407–427 (1995)
Maekawa, K.: Corpus of Spontaneous Japanese: Its design and evaluation. In: Proc. IEEEWorkshop on SSPR, Tokyo, pp. 7–12 (2003)
Maekawa, K., Kikuchi, H., Tsukahara, W.: Corpus of spontaneous Japanese: design, annotation and XML representation. In: Proc. International Symposium on Large-scale Knowledge Resources, Tokyo, pp. 19–24 (2004)
Uchimoto, K., Nobata, C., Yamada, A., Sekine, S., Isahara, H.: Morphological analysis of the Corpus of Spontaneous Japanese. In: Proc. IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo, pp. 159–162 (2003)
Venditti, J.: Japanese ToBI labeling guidelines. OSU Working Papers in Linguistics 50, 127–162 (1997)
Maekawa, K., Kikuchi, H., Igarashi, Y., Venditti, J.: X-JToBI: an extended J-ToBI for spontaneous speech. In: Proc. ICSLP, Denver, CO, pp. 1545–1548 (2002)
Kawahara, T., Nanjo, H., Shinozaki, T., Furui, S.: Benchmark test for speech recognition using the corpus of spontaneous Japanese. In: Proc. IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo, pp. 135–138 (2003)
Shinozaki, T., Furui, S.: Analysis on individual differences in automatic transcription of spontaneous presentations. In: Proc. IEEE ICASSP, Orlando, pp. I-729–732 (2002)
Ichiba, T., Iwano, K., Furui, S.: Relationships between training data size and recognition accuracy in spontaneous speech recognition. Proc. Acoustical Society of Japan Fall Meeting, 2-pp. 1–9 (2004) (in Japanese)
Ueberla, J.: Analysing a simple language model – some general conclusion for language models for speech recognition. Computer Speech & Language 8(2), 153–176 (1994)
Nakamura, M., Iwano, K., Furui, S.: Comparison of acoustic characteristics between spontaneous speech and reading speech in Japanese. In: Proc. Acoustical Society of Japan Fall Meeting, 2-P-25 (2004) (in Japanese)
Lussier, L., Whittaker, E.W.D., Furui, S.: Combinations of language model adaptation methods applied to spontaneous speech. In: Proc. Third Spontaneous Speech Science & Technology Workshop, Tokyo, pp. 73–78 (2004)
Nanjo, H., Kawahara, T.: Unsupervised language model adaptation for lecture speech recognition. In: Proc. IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo, pp. 75–78 (2003)
Shinozaki, T., Furui, S.: Spontaneous speech recognition using a massively parallel decoder. In: Proc. Interspeech-ICSLP, Jeju, Korea, vol. 3, pp. 1705–1708 (2004)
Furui, S.: Overview of the 21st century COE program “Framework for Systematization and Application of Large-scale Knowledge Resources”. In: Proc. International Symposium on Large-scale Knowledge Resources, Tokyo, pp. 1–8 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Furui, S., Nakamura, M., Ichiba, T., Iwano, K. (2005). Why Is the Recognition of Spontaneous Speech so Hard?. In: Matoušek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_3
Download citation
DOI: https://doi.org/10.1007/11551874_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28789-6
Online ISBN: 978-3-540-31817-0
eBook Packages: Computer ScienceComputer Science (R0)