Skip to main content

Automatic Detection of Hyperarticulated Speech

  • Conference paper
  • First Online:
Advances in Speech and Language Technologies for Iberian Languages (IberSPEECH 2016)

Abstract

Hyperarticulation is a speech adaptation that consists of adopting a clearer form of speech in an attempt to improve recognition levels. However, it has the opposite effect when talking to ASR systems, as they are not trained with such kind of speech. We present approaches for automatic detection of hyperarticulation, which can be used to improve the performance of spoken dialog systems. We performed experiments on Let’s Go data, using multiple feature sets and two classification approaches. Many relevant features are speaker dependent. Thus, we used the first turn in each dialog as the reference for the speaker, since it is typically not hyperarticulated. Our best results were above 80 % accuracy, which represents an improvement of at least 11.6 % points over previously obtained results on similar data. We also assessed the classifiers’ performance in scenarios where hyperarticulation is rare, achieving around 98 % accuracy using different confidence thresholds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://sox.sourceforge.net.

  2. 2.

    https://www.l2f.inesc-id.pt/spa/.

References

  1. Batista, F., Curto, P., Trancoso, I., Abad, A., Ferreira, J., Ribeiro, E., Moniz, H., de Matos, D.M., Ribeiro, R.: SPA: web-based platform for easy access to speech processing modules. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC) (2016)

    Google Scholar 

  2. Breiman, L.: Random Forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  3. Cortes, C., Vapnik, V.: Support-Vector Networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  4. Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 835–838 (2013)

    Google Scholar 

  5. Fandrianto, A., Eskenazi, M.: Prosodic entrainment in an information-driven dialog system. In: Proceedings of INTERSPEECH 2012, pp. 342–345 (2012)

    Google Scholar 

  6. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  7. Litman, D.J., Hirschberg, J., Swerts, M.: Predicting automatic speech recognition performance using prosodic cues. In: Proceedings of NAACL, pp. 218–225 (2000)

    Google Scholar 

  8. Lopes, J., Chorianopoulou, A., Palogiannidi, E., Moniz, H., Abad, A., Louka, K., Iosif, E., Potamianos, A.: The SpeDial datasets: datasets for spoken dialogue system analytics. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC) (2016)

    Google Scholar 

  9. Meinedo, H., Viveiros, M., Neto, J.A.: Evaluation of a live broadcast news subtitling system for Portuguese. In: Proceedings of INTERSPEECH 2008, pp. 508–511 (2008)

    Google Scholar 

  10. Moniz, H., Ferreira, J., Batista, F., Trancoso, I.: Disfluency in spontaneous speech. In: Proceedings of DISS 2015 (2015)

    Google Scholar 

  11. Oviatt, S., MacEachern, M., Levow, G.A.: Predicting hyperarticulate speech during human-computer error resolution. Speech Commun. 24(2), 87–110 (1998)

    Article  Google Scholar 

  12. Parent, G., Eskenazi, M.: Lexical entrainment of real users in the Let’s Go spoken dialog system. In: Proceedings of INTERSPEECH 2010, pp. 3018–3021 (2010)

    Google Scholar 

  13. Platt, J.: Fast Training of support vector machines using sequential minimal optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press (1998)

    Google Scholar 

  14. Raux, A., Bohus, D., Langner, B., Black, A.W., Eskenazi, M.: Doing research on a deployed spoken dialogue system: one year of Lets Go! experience. In: Proceedings of INTERSPEECH 2006, pp. 65–68 (2006)

    Google Scholar 

  15. Ribeiro, E., Batista, F., Trancoso, I., Lopes, J., Ribeiro, R., de Matos, D.M.: Assessing user expertise in spoken dialog system interactions. In: IberSPEECH 2016 (2016)

    Google Scholar 

  16. Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K.R., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., Mortillaro, M., Salamin, H., Polychroniou, A., Valente, F., Kim, S.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of INTERSPEECH 2013, pp. 148–152 (2013)

    Google Scholar 

  17. Soltau, H., Waibel, A.: On the influence of hyperarticulated speech on recognition performance. In: Proceedings of ICSLP (1998)

    Google Scholar 

  18. Soltau, H., Waibel, A.: Acoustic models for hyperarticulated speech. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP, pp. 1779–1782 (2000)

    Google Scholar 

  19. Stent, A.J., Huffman, M.K., Brennan, S.E.: Adapting speaking after evidence of misrecognition: local and global hyperarticulation. Speech Commun. 50(3), 163–178 (2008)

    Article  Google Scholar 

  20. Trancoso, I., do Céu Viana, M., I., Matos, G.: Corpus de Diálogo CORAL. In: PROPOR 1998 (1998)

    Google Scholar 

Download references

Acknowledgements

This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013, by Universidade de Lisboa, and by the EC H2020 project RAGE under grant agreement No 644187.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eugénio Ribeiro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Ribeiro, E., Batista, F., Trancoso, I., Ribeiro, R., de Matos, D.M. (2016). Automatic Detection of Hyperarticulated Speech. In: Abad, A., et al. Advances in Speech and Language Technologies for Iberian Languages. IberSPEECH 2016. Lecture Notes in Computer Science(), vol 10077. Springer, Cham. https://doi.org/10.1007/978-3-319-49169-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49169-1_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49168-4

  • Online ISBN: 978-3-319-49169-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics