Automatic Detection of Hyperarticulated Speech

Ribeiro, Eugénio; Batista, Fernando; Trancoso, Isabel; Ribeiro, Ricardo; de Matos, David Martins

doi:10.1007/978-3-319-49169-1_18

Eugénio Ribeiro^21,22,
Fernando Batista^21,23,
Isabel Trancoso^21,22,
Ricardo Ribeiro^21,23 &
…
David Martins de Matos^21,22

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10077))

Included in the following conference series:

International Conference on Advances in Speech and Language Technologies for Iberian Languages

651 Accesses

Abstract

Hyperarticulation is a speech adaptation that consists of adopting a clearer form of speech in an attempt to improve recognition levels. However, it has the opposite effect when talking to ASR systems, as they are not trained with such kind of speech. We present approaches for automatic detection of hyperarticulation, which can be used to improve the performance of spoken dialog systems. We performed experiments on Let’s Go data, using multiple feature sets and two classification approaches. Many relevant features are speaker dependent. Thus, we used the first turn in each dialog as the reference for the speaker, since it is typically not hyperarticulated. Our best results were above 80 % accuracy, which represents an improvement of at least 11.6 % points over previously obtained results on similar data. We also assessed the classifiers’ performance in scenarios where hyperarticulation is rare, achieving around 98 % accuracy using different confidence thresholds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Batista, F., Curto, P., Trancoso, I., Abad, A., Ferreira, J., Ribeiro, E., Moniz, H., de Matos, D.M., Ribeiro, R.: SPA: web-based platform for easy access to speech processing modules. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC) (2016)
Google Scholar
Breiman, L.: Random Forests. Mach. Learn. 45(1), 5–32 (2001)
Article MathSciNet MATH Google Scholar
Cortes, C., Vapnik, V.: Support-Vector Networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 835–838 (2013)
Google Scholar
Fandrianto, A., Eskenazi, M.: Prosodic entrainment in an information-driven dialog system. In: Proceedings of INTERSPEECH 2012, pp. 342–345 (2012)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Litman, D.J., Hirschberg, J., Swerts, M.: Predicting automatic speech recognition performance using prosodic cues. In: Proceedings of NAACL, pp. 218–225 (2000)
Google Scholar
Lopes, J., Chorianopoulou, A., Palogiannidi, E., Moniz, H., Abad, A., Louka, K., Iosif, E., Potamianos, A.: The SpeDial datasets: datasets for spoken dialogue system analytics. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC) (2016)
Google Scholar
Meinedo, H., Viveiros, M., Neto, J.A.: Evaluation of a live broadcast news subtitling system for Portuguese. In: Proceedings of INTERSPEECH 2008, pp. 508–511 (2008)
Google Scholar
Moniz, H., Ferreira, J., Batista, F., Trancoso, I.: Disfluency in spontaneous speech. In: Proceedings of DISS 2015 (2015)
Google Scholar
Oviatt, S., MacEachern, M., Levow, G.A.: Predicting hyperarticulate speech during human-computer error resolution. Speech Commun. 24(2), 87–110 (1998)
Article Google Scholar
Parent, G., Eskenazi, M.: Lexical entrainment of real users in the Let’s Go spoken dialog system. In: Proceedings of INTERSPEECH 2010, pp. 3018–3021 (2010)
Google Scholar
Platt, J.: Fast Training of support vector machines using sequential minimal optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press (1998)
Google Scholar
Raux, A., Bohus, D., Langner, B., Black, A.W., Eskenazi, M.: Doing research on a deployed spoken dialogue system: one year of Lets Go! experience. In: Proceedings of INTERSPEECH 2006, pp. 65–68 (2006)
Google Scholar
Ribeiro, E., Batista, F., Trancoso, I., Lopes, J., Ribeiro, R., de Matos, D.M.: Assessing user expertise in spoken dialog system interactions. In: IberSPEECH 2016 (2016)
Google Scholar
Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K.R., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., Mortillaro, M., Salamin, H., Polychroniou, A., Valente, F., Kim, S.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of INTERSPEECH 2013, pp. 148–152 (2013)
Google Scholar
Soltau, H., Waibel, A.: On the influence of hyperarticulated speech on recognition performance. In: Proceedings of ICSLP (1998)
Google Scholar
Soltau, H., Waibel, A.: Acoustic models for hyperarticulated speech. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP, pp. 1779–1782 (2000)
Google Scholar
Stent, A.J., Huffman, M.K., Brennan, S.E.: Adapting speaking after evidence of misrecognition: local and global hyperarticulation. Speech Commun. 50(3), 163–178 (2008)
Article Google Scholar
Trancoso, I., do Céu Viana, M., I., Matos, G.: Corpus de Diálogo CORAL. In: PROPOR 1998 (1998)
Google Scholar

Download references

Acknowledgements

This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013, by Universidade de Lisboa, and by the EC H2020 project RAGE under grant agreement No 644187.

Author information

Authors and Affiliations

L2F – Spoken Language Systems Laboratory, INESC-ID Lisboa, Lisbon, Portugal
Eugénio Ribeiro, Fernando Batista, Isabel Trancoso, Ricardo Ribeiro & David Martins de Matos
Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
Eugénio Ribeiro, Isabel Trancoso & David Martins de Matos
ISCTE-IUL – Instituto Universitário de Lisboa, Lisbon, Portugal
Fernando Batista & Ricardo Ribeiro

Authors

Eugénio Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Batista
View author publications
You can also search for this author in PubMed Google Scholar
Isabel Trancoso
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
David Martins de Matos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eugénio Ribeiro .

Editor information

Editors and Affiliations

INESC-ID/IST, Universidade de Lisboa, Lisbon, Portugal
Alberto Abad
I3A/University of Zaragoza, Zaragoza, Spain
Alfonso Ortega
DETI/IEETA, University of Aveiro, Aveiro, Portugal
António Teixeira
AtlantTIC Research Center, Universidad de Vigo, Vigo, Spain
Carmen García Mateo
Universitat Politècnica de València, Valencia, Spain
Carlos D. Martínez Hinarejos
University of Coimbra, Coimbra, Portugal
Fernando Perdigão
INESC-ID/ISCTE-IUL, Lisbon, Portugal
Fernando Batista
INESC-ID/IST, Universidade de Lisboa, Lisbon, Portugal
Nuno Mamede

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ribeiro, E., Batista, F., Trancoso, I., Ribeiro, R., de Matos, D.M. (2016). Automatic Detection of Hyperarticulated Speech. In: Abad, A., et al. Advances in Speech and Language Technologies for Iberian Languages. IberSPEECH 2016. Lecture Notes in Computer Science(), vol 10077. Springer, Cham. https://doi.org/10.1007/978-3-319-49169-1_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-49169-1_18
Published: 04 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49168-4
Online ISBN: 978-3-319-49169-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics