Why Is the Recognition of Spontaneous Speech so Hard?

Furui, Sadaoki; Nakamura, Masanobu; Ichiba, Tomohisa; Iwano, Koji

doi:10.1007/11551874_3

Sadaoki Furui¹⁹,
Masanobu Nakamura¹⁹,
Tomohisa Ichiba¹⁹ &
…
Koji Iwano¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3658))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

769 Accesses
12 Citations

Abstract

Although speech, derived from reading texts, and similar types of speech, e.g. that from reading newspapers or that from news broadcast, can be recognized with high accuracy, recognition accuracy drastically decreases for spontaneous speech. This is due to the fact that spontaneous speech and read speech are significantly different acoustically as well as linguistically. This paper reports analysis and recognition of spontaneous speech using a large-scale spontaneous speech database “Corpus of Spontaneous Japanese (CSJ)”. Recognition results in this experiment show that recognition accuracy significantly increases as a function of the size of acoustic as well as language model training data and the improvement levels off at approximately 7M words of training data. This means that acoustic and linguistic variation of spontaneous speech is so large that we need a very large corpus in order to encompass the variations. Spectral analysis using various styles of utterances in the CSJ shows that the spectral distribution/difference of phonemes is significantly reduced in spontaneous speech compared to read speech. Experimental results also show that there is a strong correlation between mean spectral distance between phonemes and phoneme recognition accuracy. This indicates that spectral reduction is one major reason for the decrease of recognition accuracy of spontaneous speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Furui, S.: Recent advances in spontaneous speech recognition and understanding. In: Proc. IEEE Workshop on SSPR, Tokyo, pp. 1–6 (2003)
Google Scholar
Furui, S.: Toward spontaneous speech recognition and understanding. In: Chou, W., Juang, B.-H. (eds.) Pattern Recognition in Speech and Language Processing, pp. 191–227. CRC Press, New York (2003)
Google Scholar
Shinozaki, T., Hori, C., Furui, S.: Towards automatic transcription of spontaneous presentations. In: Proc. Eurospeech, Aalborg, Denmark, pp. 491–494 (2001)
Google Scholar
Sankar, A., Gadde, V.R.R., Stolcke, A., Weng, F.: Improved modeling and efficiency for automatic transcription of broadcast news. Speech Communication 37, 133–158 (2002)
Article MATH Google Scholar
Gauvain, J.-L., Lamel, L.: Large vocabulary speech recognition based on statistical methods. In: Chou, W., Juang, B.-H. (eds.) Pattern Recognition in Speech and Language Processing, pp. 149–189. CRC Press, New York (2003)
Google Scholar
Evermann, G., et al.: Development of the, CU-HTK conversational telephone speech transcription system. In: Proc. IEEE ICASSP, Montreal, pp. I-249–252 (2003)
Google Scholar
Schwartz, R., et al.: Speech recognition in multiple languages and domains: the, BBN/LIMSI EARS system. In: Proc. IEEE ICASSP, Montreal, pp. III-753–756 (2003)
Google Scholar
van Son, R.J.J.H., Pols, L.C.W.: An acoustic description of consonant reduction. Speech Communication 28(2), 125–140 (1999)
Article Google Scholar
Duez, D.: On spontaneous French speech: aspects of the reduction and contextual assimilation of voiced stops. J. Phonetics 23, 407–427 (1995)
Article Google Scholar
Maekawa, K.: Corpus of Spontaneous Japanese: Its design and evaluation. In: Proc. IEEEWorkshop on SSPR, Tokyo, pp. 7–12 (2003)
Google Scholar
Maekawa, K., Kikuchi, H., Tsukahara, W.: Corpus of spontaneous Japanese: design, annotation and XML representation. In: Proc. International Symposium on Large-scale Knowledge Resources, Tokyo, pp. 19–24 (2004)
Google Scholar
Uchimoto, K., Nobata, C., Yamada, A., Sekine, S., Isahara, H.: Morphological analysis of the Corpus of Spontaneous Japanese. In: Proc. IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo, pp. 159–162 (2003)
Google Scholar
Venditti, J.: Japanese ToBI labeling guidelines. OSU Working Papers in Linguistics 50, 127–162 (1997)
Google Scholar
Maekawa, K., Kikuchi, H., Igarashi, Y., Venditti, J.: X-JToBI: an extended J-ToBI for spontaneous speech. In: Proc. ICSLP, Denver, CO, pp. 1545–1548 (2002)
Google Scholar
Kawahara, T., Nanjo, H., Shinozaki, T., Furui, S.: Benchmark test for speech recognition using the corpus of spontaneous Japanese. In: Proc. IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo, pp. 135–138 (2003)
Google Scholar
Shinozaki, T., Furui, S.: Analysis on individual differences in automatic transcription of spontaneous presentations. In: Proc. IEEE ICASSP, Orlando, pp. I-729–732 (2002)
Google Scholar
Ichiba, T., Iwano, K., Furui, S.: Relationships between training data size and recognition accuracy in spontaneous speech recognition. Proc. Acoustical Society of Japan Fall Meeting, 2-pp. 1–9 (2004) (in Japanese)
Google Scholar
Ueberla, J.: Analysing a simple language model – some general conclusion for language models for speech recognition. Computer Speech & Language 8(2), 153–176 (1994)
Article Google Scholar
Nakamura, M., Iwano, K., Furui, S.: Comparison of acoustic characteristics between spontaneous speech and reading speech in Japanese. In: Proc. Acoustical Society of Japan Fall Meeting, 2-P-25 (2004) (in Japanese)
Google Scholar
Lussier, L., Whittaker, E.W.D., Furui, S.: Combinations of language model adaptation methods applied to spontaneous speech. In: Proc. Third Spontaneous Speech Science & Technology Workshop, Tokyo, pp. 73–78 (2004)
Google Scholar
Nanjo, H., Kawahara, T.: Unsupervised language model adaptation for lecture speech recognition. In: Proc. IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo, pp. 75–78 (2003)
Google Scholar
Shinozaki, T., Furui, S.: Spontaneous speech recognition using a massively parallel decoder. In: Proc. Interspeech-ICSLP, Jeju, Korea, vol. 3, pp. 1705–1708 (2004)
Google Scholar
Furui, S.: Overview of the 21st century COE program “Framework for Systematization and Application of Large-scale Knowledge Resources”. In: Proc. International Symposium on Large-scale Knowledge Resources, Tokyo, pp. 1–8 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo, 152-8552, Japan
Sadaoki Furui, Masanobu Nakamura, Tomohisa Ichiba & Koji Iwano

Authors

Sadaoki Furui
View author publications
You can also search for this author in PubMed Google Scholar
Masanobu Nakamura
View author publications
You can also search for this author in PubMed Google Scholar
Tomohisa Ichiba
View author publications
You can also search for this author in PubMed Google Scholar
Koji Iwano
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of West Bohemia in Pilsen, Univerzitni 8, 30614, Plzen, Czech Republic
Václav Matoušek , Pavel Mautner & Tomáš Pavelka , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Furui, S., Nakamura, M., Ichiba, T., Iwano, K. (2005). Why Is the Recognition of Spontaneous Speech so Hard?. In: Matoušek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_3

Download citation

DOI: https://doi.org/10.1007/11551874_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28789-6
Online ISBN: 978-3-540-31817-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics