Skip to main content

Multi-factor Method for Detection of Filled Pauses and Lengthenings in Russian Spontaneous Speech

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9319))

Included in the following conference series:

Abstract

Spontaneous speech contains high rates of speech disfluencies, most common being filled paused and lengthenings (FPs). Human language technologies are often developed for other than spontaneous types of speech, and disfluencies occurrence is the reason for many mistakes in automatic speech recognition systems. In this paper we present a method of automatic detection of FPs using linear combination of statistical characteristics of acoustic parameters variance, basing on a preliminary study of FPs parameters across the mixed and quality-diverse corpus of Russian spontaneous speech. Experiments were carried out on a corpus, consisting of the task-based dialogue corpus of Russian spontaneous speech collected in SPIIRAS and on Russian casual conversations from Open Source Multi-Language Audio Database collected in Binghamton University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anderson, A., Bader, M., Bard, E., Boyle, E., Doherty, G., Garrod, S., et al.: The hcrc map task corpus. Lang. Speech 34(4), 351–366 (1991)

    Google Scholar 

  2. Audhkhasi, K., Kandhway, K., Deshmukh, O., Verma, A.: Formant-based technique for automatic filled-pause detection in spontaneous spoken english. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2009, pp. 4857–4860. IEEE (2009)

    Google Scholar 

  3. Chafe, W.L. (ed.): The Pear Stories: Cognitive, Cultural, and Linguistic Aspects of Narrative Production. Ablex, Norwood (1980)

    Google Scholar 

  4. Clark, H.: Using Language. Cambridge University Press, Cambridge (1996)

    Book  Google Scholar 

  5. Eisler, F.G.: Psycholinguistics: Experiments in Spontaneous Speech. Academic Press, New York (1968)

    Google Scholar 

  6. Ferreira, F., Lau, E.F., Bailey, K.G.: Disfluencies, language comprehension, and tree adjoining grammars. Cogn. Sci. 28(5), 721–749 (2004)

    Article  Google Scholar 

  7. Garg, G., Ward, N.: Detecting filled pauses in tutorial dialogs (2006)

    Google Scholar 

  8. Godfrey, J.J., Holliman, E.C., McDaniel, J.: Switchboard: telephone speech corpus for research and development. In: 992 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-92, vol. 1, pp. 517–520. IEEE (1992)

    Google Scholar 

  9. Goto, M., Itou, K., Hayamizu, S.: A real-time filled pause detection system for spontaneous speech recognition. In: Eurospeech, Citeseer (1999)

    Google Scholar 

  10. INTERSPEECH: computational paralinguistic challenge in 2013. http://emotion-research.net/sigs/speech-sig/is13-compare. Accessed 1 Apr 2015

  11. Kibrik, A., Podlesskaya, V. (eds.): Rasskazy o snovideniyah: Korpusnoye issledovaniye ustnogo russkogo diskursa [Night dream stories: Corpus study of Russian discourse]. Litres (2014)

    Google Scholar 

  12. Kohler, K.: Labelled data bank of spoken standard german: the kiel corpus of read/spontaneous speech. In: Proceedings of Fourth International Conference on Spoken Language, ICSLP 96, vol. 3, pp. 1938–1941. IEEE (1996)

    Google Scholar 

  13. Lease, M., Johnson, M., Charniak, E.: Recognizing disfluencies in conversational speech. IEEE Trans. Audio, Speech Lang. Process. 14(5), 1566–1573 (2006)

    Article  Google Scholar 

  14. Liu, Y., Shriberg, E., Stolcke, A.: Automatic disfluency identification in conversational speech using multiple knowledge sources. In: 8th European Conference on Speech Communication and Technology Proceedings, INTERSPEECH, pp. 957-960 (2003)

    Google Scholar 

  15. Liu, Y., Shriberg, E., Stolcke, A., Hillard, D., Ostendorf, M., Harper, M.: Enriching speech recognition with automatic detection of sentence boundaries and disfluencies. IEEE Trans. Audio, Speech Lang. Process. 14(5), 1526–1540 (2006)

    Article  Google Scholar 

  16. Medeiros, H., Moniz, H., Batista, F., Trancoso, I., Nunes, L., et al.: Disfluency detection based on prosodic features for university lectures. In: 14th Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 2629–2633 (2013)

    Google Scholar 

  17. O’Connell, D., Kowal, S.: The history of research on the filled pause as evidence of the written language bias in linguistics (linell, 1982). J. Psycholinguist. Res. 33, 459–474 (2004)

    Article  Google Scholar 

  18. Ogden, R.: Turn-holding, turn-yielding and laryngeal activity in finnish talk-in-interaction. J. Int. Phonetics Assoc. 31(1), 139–152 (2001)

    MathSciNet  Google Scholar 

  19. O’Shaughnessy, D.: Recognition of hesitations in spontaneous speech. In: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-92, vol. 1, pp. 521–524. IEEE (1992)

    Google Scholar 

  20. Shriberg, E.: Preliminaries to a theory of speech disfluencies. Ph.D. thesis, University of California at Berkeley (1994)

    Google Scholar 

  21. Shriberg, E.: Spontaneous speech: how people really talk and why engineers should care. In: 9th European Conference on Speech Communication and Technology, INTERSPEECH, pp. 1781–1784 (2005)

    Google Scholar 

  22. Kober, J., Peters, J.: Introduction. In: Kober, J., Peters, J. (eds.) Learning Motor Skills. STAR, vol. 97, pp. 1–6. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  23. Stepanova, S.: Some features of filled hesitation pauses in spontaneous Russian. Proc. ICPhS. 16, 1325–1328 (2007)

    Google Scholar 

  24. Stolcke, A., Shriberg, E., Bates, R.A., Ostendorf, M., Hakkani, D., Plauche, M., Tür, G., Lu, Y.: Automatic detection of sentence boundaries and disfluencies based on recognized words. In: ICSLP (1998)

    Google Scholar 

  25. Tree, J.E.F.: The effects of false starts and repetitions on the processing of subsequent words in spontaneous speech. J. Mem Lang. 34(6), 709–738 (1995)

    Article  Google Scholar 

  26. Veiga, A., Candeias, S., Lopes, C., Perdigão, F.: Characterization of hesitations using acoustic models. In: International Congress of Phonetic Sciences-ICPhS XVII, pp. 2054–2057 (2011)

    Google Scholar 

  27. Verkhodanova, V., Shapranov, V.: Automatic detection of filled pauses and lengthenings in the spontaneous russian speech. In: 7th Speech Prosody conference, pp. 1110–1114 (2014)

    Google Scholar 

  28. Zahorian, S.: Open-source multi-language audio database for spoken language processing applications. Technical report, DTIC Document (2012)

    Google Scholar 

  29. Zemskaya, E.: Russian spoken speech: linguistic analysis and the problems of learning. Moscow (1979)

    Google Scholar 

Download references

Acknowledgements

This research is supported by the grant of Russian Foundation for Basic Research (project No 15-06-04465) and by the Council of Grants of the President of Russia (project No MK-5209.2015.8).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vasilisa Verkhodanova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Verkhodanova, V., Shapranov, V. (2015). Multi-factor Method for Detection of Filled Pauses and Lengthenings in Russian Spontaneous Speech. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23132-7_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23131-0

  • Online ISBN: 978-3-319-23132-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics