Abstract
Hyperarticulation is a speech adaptation that consists of adopting a clearer form of speech in an attempt to improve recognition levels. However, it has the opposite effect when talking to ASR systems, as they are not trained with such kind of speech. We present approaches for automatic detection of hyperarticulation, which can be used to improve the performance of spoken dialog systems. We performed experiments on Let’s Go data, using multiple feature sets and two classification approaches. Many relevant features are speaker dependent. Thus, we used the first turn in each dialog as the reference for the speaker, since it is typically not hyperarticulated. Our best results were above 80 % accuracy, which represents an improvement of at least 11.6 % points over previously obtained results on similar data. We also assessed the classifiers’ performance in scenarios where hyperarticulation is rare, achieving around 98 % accuracy using different confidence thresholds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Batista, F., Curto, P., Trancoso, I., Abad, A., Ferreira, J., Ribeiro, E., Moniz, H., de Matos, D.M., Ribeiro, R.: SPA: web-based platform for easy access to speech processing modules. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC) (2016)
Breiman, L.: Random Forests. Mach. Learn. 45(1), 5–32 (2001)
Cortes, C., Vapnik, V.: Support-Vector Networks. Mach. Learn. 20(3), 273–297 (1995)
Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 835–838 (2013)
Fandrianto, A., Eskenazi, M.: Prosodic entrainment in an information-driven dialog system. In: Proceedings of INTERSPEECH 2012, pp. 342–345 (2012)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Litman, D.J., Hirschberg, J., Swerts, M.: Predicting automatic speech recognition performance using prosodic cues. In: Proceedings of NAACL, pp. 218–225 (2000)
Lopes, J., Chorianopoulou, A., Palogiannidi, E., Moniz, H., Abad, A., Louka, K., Iosif, E., Potamianos, A.: The SpeDial datasets: datasets for spoken dialogue system analytics. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC) (2016)
Meinedo, H., Viveiros, M., Neto, J.A.: Evaluation of a live broadcast news subtitling system for Portuguese. In: Proceedings of INTERSPEECH 2008, pp. 508–511 (2008)
Moniz, H., Ferreira, J., Batista, F., Trancoso, I.: Disfluency in spontaneous speech. In: Proceedings of DISS 2015 (2015)
Oviatt, S., MacEachern, M., Levow, G.A.: Predicting hyperarticulate speech during human-computer error resolution. Speech Commun. 24(2), 87–110 (1998)
Parent, G., Eskenazi, M.: Lexical entrainment of real users in the Let’s Go spoken dialog system. In: Proceedings of INTERSPEECH 2010, pp. 3018–3021 (2010)
Platt, J.: Fast Training of support vector machines using sequential minimal optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press (1998)
Raux, A., Bohus, D., Langner, B., Black, A.W., Eskenazi, M.: Doing research on a deployed spoken dialogue system: one year of Lets Go! experience. In: Proceedings of INTERSPEECH 2006, pp. 65–68 (2006)
Ribeiro, E., Batista, F., Trancoso, I., Lopes, J., Ribeiro, R., de Matos, D.M.: Assessing user expertise in spoken dialog system interactions. In: IberSPEECH 2016 (2016)
Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K.R., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., Mortillaro, M., Salamin, H., Polychroniou, A., Valente, F., Kim, S.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of INTERSPEECH 2013, pp. 148–152 (2013)
Soltau, H., Waibel, A.: On the influence of hyperarticulated speech on recognition performance. In: Proceedings of ICSLP (1998)
Soltau, H., Waibel, A.: Acoustic models for hyperarticulated speech. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP, pp. 1779–1782 (2000)
Stent, A.J., Huffman, M.K., Brennan, S.E.: Adapting speaking after evidence of misrecognition: local and global hyperarticulation. Speech Commun. 50(3), 163–178 (2008)
Trancoso, I., do Céu Viana, M., I., Matos, G.: Corpus de Diálogo CORAL. In: PROPOR 1998 (1998)
Acknowledgements
This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013, by Universidade de Lisboa, and by the EC H2020 project RAGE under grant agreement No 644187.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Ribeiro, E., Batista, F., Trancoso, I., Ribeiro, R., de Matos, D.M. (2016). Automatic Detection of Hyperarticulated Speech. In: Abad, A., et al. Advances in Speech and Language Technologies for Iberian Languages. IberSPEECH 2016. Lecture Notes in Computer Science(), vol 10077. Springer, Cham. https://doi.org/10.1007/978-3-319-49169-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-49169-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49168-4
Online ISBN: 978-3-319-49169-1
eBook Packages: Computer ScienceComputer Science (R0)