Skip to main content

Toward Exploring the Role of Disfluencies from an Acoustic Point of View: A New Aspect of (Dis)continuous Speech Prosody Modelling

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

  • 1810 Accesses

Abstract

Several studies use idealized, fluent utterances to comprehend spoken language. Disfluencies are often regarded to be just a noise in the speech flow. Other works argue that fragmented structures (disfluencies, silent and filled pauses) are important and can help better understanding. By extending the original concept of speech disfluency, the current paper involves the acoustic level and places the discontinuity of F0 in parallel with speech disfluencies. An exhaustive analysis of the advantages and disadvantages of using a continuous F0 estimate in prosodic event detection tasks is performed for formal and informal speaking styles. Results suggest that unlike in read (formal) speech, using a continuous, overall interpolated F0 curve is counterproductive in spontaneous (informal) speech. Comparing the behaviour of speech disfluencies and the effect of discontinuity of the F0 contour, results raise more general modelling philosophy considerations, as they suggest that disfluencies in informal speech may be by themselves informative entities, reflected also in the acoustic level organization of speech, which suggests that disfluencies in general are an important perceptual cue in human speech understanding.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Silverman, K.M., Beckman, J., Pitrelli, M., Ostendorf, C., Wightman, P., Price, J.P., Hirschberg, J.: Tobi: a standard for labelling english prosody. In: Proceedings of the 2nd International Conference on Spoken Language Processing (ICSLP-92), pp. 867–870 (1992)

    Google Scholar 

  2. Selkirk, E.: The syntax-phonology interface. In: International Encyclopaedia of the Social and Behavioural Sciences, pp. 15407–15412. Pergamon, Oxford (2001)

    Google Scholar 

  3. Veilleux, N., Ostendorf, M.: Prosody/parse scoring and its application in atis. In: Proceedings of the Workshop on Human Language Technology, pp. 335–340 (1993)

    Google Scholar 

  4. Gallwitz, F., Niemann, H., Nöth, E., Warnke, W.: Integrated recognition of words and prosodic phrase boundaries. Speech Communication 36(1–2), 81–95 (2002)

    Article  MATH  Google Scholar 

  5. Szaszák, G., Beke, A.: Exploiting prosody for automatic syntactic phrase boundary detection in speech. Journal of Language Modeling 0(1), 143–172 (2012)

    Article  Google Scholar 

  6. Beke, A., Szaszák, G.: Unsupervised clustering of prosodic patterns in spontaneous speech. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 648–655. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  7. Medeiros, H., Batista, F., Moniz, H., Trancoso, I., Meinedo, H.: Experiments on automatic detection of filled pauses using prosodic features. Actas de Inforum 2013, 335–345 (2013)

    Google Scholar 

  8. Swerts, M.: Filled pauses as markers of discourse structure. Journal of Pragmatics 30, 485–946 (1998)

    Article  Google Scholar 

  9. Cook, H., Lallijee, M.: The interpretation of pauses by the listener. Brit. J. Soc. Clin. Psy. 9, 375–376 (1970)

    Article  Google Scholar 

  10. Swerts, M., Ostendorf, M.: Prosodic and lexical indications of discourse structure in human-machine interactions. Speech Communication 22(1), 25–41 (1997)

    Article  Google Scholar 

  11. Swerts, A., Wichmann, A., Beun, R.J.: Filled pauses as markers of discourse structure. In: Proceedings ICSLP96, Fourth International Conference on Spoken Language Processing, pp. 1033–1036 (1996)

    Google Scholar 

  12. Zellner, B.: Pauses and the temporal structure of speech. In: Fundamentals of Speech Synthesis and Speech Recognition, pp. 41–62. John Wiley, Chichester (1994)

    Google Scholar 

  13. Hirst, D., Cristo, A.D.: Intonation Systems: A Survey of Twenty Languages. Cambridge University Press, New York (1989)

    Google Scholar 

  14. Ghahremani, P., BabaAli, B., Povey, D., Riedhammer, K., Trmal, J., Khudanpur, S.: A pitch extraction algorithm tuned for automatic speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2494–2498 (2014)

    Google Scholar 

  15. Roach, P.S., Amfield, S., Bany, W., Baltova, J., Boldea, M., Fourcin, A., Goner, W., Gubrynowicz, R., Hallum, E., Lamep, L., Marasek, K., Marchal, A., Meiste, E., Vicsi, K.: Babel: an eastern european multi-language database. In: International Conf. on Speech and Language, pp. 1033–1036 (1996)

    Google Scholar 

  16. Neuberger, T., Gyarmathy, D., Gráczi, T.E., Horváth, V., Gósy, M., Beke, A.: Development of a large spontaneous speech database of agglutinative Hungarian language. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS, vol. 8655, pp. 424–431. Springer, Heidelberg (2014)

    Google Scholar 

  17. Sjölander, K., Beskow, A.: Wavesurfer - an open source speech tool. In: Proceedings of the 6th International Conference of Spoken Language Processing, vol. 4, pp. 464–467 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to György Szaszák .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Szaszák, G., Beke, A. (2015). Toward Exploring the Role of Disfluencies from an Acoustic Point of View: A New Aspect of (Dis)continuous Speech Prosody Modelling. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24033-6_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24032-9

  • Online ISBN: 978-3-319-24033-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics