Skip to main content

A Tagged Corpus-Based Study for Repeats and Self-repairs Detection in French Transcribed Speech

  • Conference paper
Text, Speech and Dialogue (TSD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5246))

Included in the following conference series:

Abstract

We present in this paper the results of a tagged corpus-based study conducted on two kinds of disfluencies (repeats and self-repairs) from a corpus of spontaneous spoken French. This work first investigates the linguistic features of both phenomena, and then shows how – from a corpus output tagged with TreeTagger – to take into account repeats and self-repairs using word N-grams model and rule-based pattern matching. Some results on a test corpus are finally presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adda-Decker, M., Habert, B., Barras, C., Adda, G., Boula De Mareuil, P., Paroubek, P.: A disfluency study for cleaning spontaneous speech automatic transcripts and improving speech language models. In: DIsfluencies in Spontaneous Speech conference, pp. 67–70. Göteborg University, Sweden (2003)

    Google Scholar 

  2. Blanche-Benveniste, C.: Approches de la langue parlée en français. Collection L’essentiel Français, Editions OPHRYS, Paris (2000)

    Google Scholar 

  3. Clark, H.H., Wasow, T.: Repeating words in spontaneous speech. Cognitive Psychology 37, 201–242 (1998)

    Article  Google Scholar 

  4. Core, M., Schubert, L.: A syntactic framework for speech repairs and other disruptions. In: 37th Annual Meeting of the Association for Computational Linguistics, College Park, pp. 413–420 (1999)

    Google Scholar 

  5. Delic, E.: Présentation du Corpus de Référence du Français Parlé. Recherches Sur le Français Parlé 18, 11–42 (2004)

    Google Scholar 

  6. Engel, D., Charniak, E., Jonhson, M.: Parsing and disfluency placement. In: ACL conference on Empirical Methods in Language Processing, vol. 10, pp. 49–54 (2002)

    Google Scholar 

  7. Heeman, P.A., Allen, J.: Detecting and correcting speech repairs. In: 32nd Annual Meeting of the Association for Computational Linguistics, pp. 295–302 (1994)

    Google Scholar 

  8. Henry, S., Campione, E., Véronis, J.: Répétitions et pauses (silencieuses et remplies) en français spontané. In: 15th Journées d’Etude sur la Parole, pp. 261–264 (2004)

    Google Scholar 

  9. Levelt, W.J.M.: Monitoring and self-repair in speech. Cognition 14, 41–104 (1983)

    Article  Google Scholar 

  10. Lickley, R.: Detecting disfluency in spontaneous speech. Ph.D. thesis, University of Edinburgh. Scotland (1994)

    Google Scholar 

  11. Liu, Y., Shriberg, E., Stolcke, A.: Automatic disfluency identification in conversational speech using multiple knowledge sources. In: EUROSPEECH 2003, Geneva, Switzerland, pp. 957–960 (2003)

    Google Scholar 

  12. Martinie, B.: Etude syntaxique des énoncés réparés en français parlé. Thèse d’état, Université Paris X-Nanterre, France (1999)

    Google Scholar 

  13. Schmid, H.: Probabilistic Part-of-Speech Tagging Using Decision Trees. Revised version, original work. In: International Conference on New Methods in Language Processing, pp. 44–49 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Petr Sojka Aleš Horák Ivan Kopeček Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bove, R. (2008). A Tagged Corpus-Based Study for Repeats and Self-repairs Detection in French Transcribed Speech. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87391-4_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87390-7

  • Online ISBN: 978-3-540-87391-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics