Multi-factor Method for Detection of Filled Pauses and Lengthenings in Russian Spontaneous Speech

Verkhodanova, Vasilisa; Shapranov, Vladimir

doi:10.1007/978-3-319-23132-7_35

Vasilisa Verkhodanova⁷ &
Vladimir Shapranov⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9319))

Included in the following conference series:

International Conference on Speech and Computer

1587 Accesses
5 Citations

Abstract

Spontaneous speech contains high rates of speech disfluencies, most common being filled paused and lengthenings (FPs). Human language technologies are often developed for other than spontaneous types of speech, and disfluencies occurrence is the reason for many mistakes in automatic speech recognition systems. In this paper we present a method of automatic detection of FPs using linear combination of statistical characteristics of acoustic parameters variance, basing on a preliminary study of FPs parameters across the mixed and quality-diverse corpus of Russian spontaneous speech. Experiments were carried out on a corpus, consisting of the task-based dialogue corpus of Russian spontaneous speech collected in SPIIRAS and on Russian casual conversations from Open Source Multi-Language Audio Database collected in Binghamton University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anderson, A., Bader, M., Bard, E., Boyle, E., Doherty, G., Garrod, S., et al.: The hcrc map task corpus. Lang. Speech 34(4), 351–366 (1991)
Google Scholar
Audhkhasi, K., Kandhway, K., Deshmukh, O., Verma, A.: Formant-based technique for automatic filled-pause detection in spontaneous spoken english. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2009, pp. 4857–4860. IEEE (2009)
Google Scholar
Chafe, W.L. (ed.): The Pear Stories: Cognitive, Cultural, and Linguistic Aspects of Narrative Production. Ablex, Norwood (1980)
Google Scholar
Clark, H.: Using Language. Cambridge University Press, Cambridge (1996)
Book Google Scholar
Eisler, F.G.: Psycholinguistics: Experiments in Spontaneous Speech. Academic Press, New York (1968)
Google Scholar
Ferreira, F., Lau, E.F., Bailey, K.G.: Disfluencies, language comprehension, and tree adjoining grammars. Cogn. Sci. 28(5), 721–749 (2004)
Article Google Scholar
Garg, G., Ward, N.: Detecting filled pauses in tutorial dialogs (2006)
Google Scholar
Godfrey, J.J., Holliman, E.C., McDaniel, J.: Switchboard: telephone speech corpus for research and development. In: 992 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-92, vol. 1, pp. 517–520. IEEE (1992)
Google Scholar
Goto, M., Itou, K., Hayamizu, S.: A real-time filled pause detection system for spontaneous speech recognition. In: Eurospeech, Citeseer (1999)
Google Scholar
INTERSPEECH: computational paralinguistic challenge in 2013. http://emotion-research.net/sigs/speech-sig/is13-compare. Accessed 1 Apr 2015
Kibrik, A., Podlesskaya, V. (eds.): Rasskazy o snovideniyah: Korpusnoye issledovaniye ustnogo russkogo diskursa [Night dream stories: Corpus study of Russian discourse]. Litres (2014)
Google Scholar
Kohler, K.: Labelled data bank of spoken standard german: the kiel corpus of read/spontaneous speech. In: Proceedings of Fourth International Conference on Spoken Language, ICSLP 96, vol. 3, pp. 1938–1941. IEEE (1996)
Google Scholar
Lease, M., Johnson, M., Charniak, E.: Recognizing disfluencies in conversational speech. IEEE Trans. Audio, Speech Lang. Process. 14(5), 1566–1573 (2006)
Article Google Scholar
Liu, Y., Shriberg, E., Stolcke, A.: Automatic disfluency identification in conversational speech using multiple knowledge sources. In: 8th European Conference on Speech Communication and Technology Proceedings, INTERSPEECH, pp. 957-960 (2003)
Google Scholar
Liu, Y., Shriberg, E., Stolcke, A., Hillard, D., Ostendorf, M., Harper, M.: Enriching speech recognition with automatic detection of sentence boundaries and disfluencies. IEEE Trans. Audio, Speech Lang. Process. 14(5), 1526–1540 (2006)
Article Google Scholar
Medeiros, H., Moniz, H., Batista, F., Trancoso, I., Nunes, L., et al.: Disfluency detection based on prosodic features for university lectures. In: 14th Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 2629–2633 (2013)
Google Scholar
O’Connell, D., Kowal, S.: The history of research on the filled pause as evidence of the written language bias in linguistics (linell, 1982). J. Psycholinguist. Res. 33, 459–474 (2004)
Article Google Scholar
Ogden, R.: Turn-holding, turn-yielding and laryngeal activity in finnish talk-in-interaction. J. Int. Phonetics Assoc. 31(1), 139–152 (2001)
MathSciNet Google Scholar
O’Shaughnessy, D.: Recognition of hesitations in spontaneous speech. In: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-92, vol. 1, pp. 521–524. IEEE (1992)
Google Scholar
Shriberg, E.: Preliminaries to a theory of speech disfluencies. Ph.D. thesis, University of California at Berkeley (1994)
Google Scholar
Shriberg, E.: Spontaneous speech: how people really talk and why engineers should care. In: 9th European Conference on Speech Communication and Technology, INTERSPEECH, pp. 1781–1784 (2005)
Google Scholar
Kober, J., Peters, J.: Introduction. In: Kober, J., Peters, J. (eds.) Learning Motor Skills. STAR, vol. 97, pp. 1–6. Springer, Heidelberg (2014)
Chapter Google Scholar
Stepanova, S.: Some features of filled hesitation pauses in spontaneous Russian. Proc. ICPhS. 16, 1325–1328 (2007)
Google Scholar
Stolcke, A., Shriberg, E., Bates, R.A., Ostendorf, M., Hakkani, D., Plauche, M., Tür, G., Lu, Y.: Automatic detection of sentence boundaries and disfluencies based on recognized words. In: ICSLP (1998)
Google Scholar
Tree, J.E.F.: The effects of false starts and repetitions on the processing of subsequent words in spontaneous speech. J. Mem Lang. 34(6), 709–738 (1995)
Article Google Scholar
Veiga, A., Candeias, S., Lopes, C., Perdigão, F.: Characterization of hesitations using acoustic models. In: International Congress of Phonetic Sciences-ICPhS XVII, pp. 2054–2057 (2011)
Google Scholar
Verkhodanova, V., Shapranov, V.: Automatic detection of filled pauses and lengthenings in the spontaneous russian speech. In: 7th Speech Prosody conference, pp. 1110–1114 (2014)
Google Scholar
Zahorian, S.: Open-source multi-language audio database for spoken language processing applications. Technical report, DTIC Document (2012)
Google Scholar
Zemskaya, E.: Russian spoken speech: linguistic analysis and the problems of learning. Moscow (1979)
Google Scholar

Download references

Acknowledgements

This research is supported by the grant of Russian Foundation for Basic Research (project No 15-06-04465) and by the Council of Grants of the President of Russia (project No MK-5209.2015.8).

Author information

Authors and Affiliations

SPIIRAS, 39, 14th Line, St. Petersburg, Russia
Vasilisa Verkhodanova
Betria Systems Inc, 50, Building 11, Ligovsky Prospekt, St. Petersburg, Russia
Vladimir Shapranov

Authors

Vasilisa Verkhodanova
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Shapranov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vasilisa Verkhodanova .

Editor information

Editors and Affiliations

SPIIRAS, Saint-Petersburg, Russia
Andrey Ronzhin
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Patras, Patras, Greece
Nikos Fakotakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Verkhodanova, V., Shapranov, V. (2015). Multi-factor Method for Detection of Filled Pauses and Lengthenings in Russian Spontaneous Speech. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-23132-7_35
Published: 04 September 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics