Improving Speech Recognition through Automatic Selection of Age Group – Specific Acoustic Models

Hämäläinen, Annika; Meinedo, Hugo; Tjalve, Michael; Pellegrini, Thomas; Trancoso, Isabel; Dias, Miguel Sales

doi:10.1007/978-3-319-09761-9_2

Annika Hämäläinen²⁵,
Hugo Meinedo²⁶,
Michael Tjalve²⁸,
Thomas Pellegrini²⁹,
Isabel Trancoso^26,27 &
…
Miguel Sales Dias²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8775))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

690 Accesses
5 Citations

Abstract

The acoustic models used by automatic speech recognisers are usually trained with speech collected from young to middle-aged adults. As the characteristics of speech change with age, such acoustic models tend to perform poorly on children’s and elderly people’s speech. In this study, we investigate whether the automatic age group classification of speakers, together with age group –specific acoustic models, could improve automatic speech recognition performance. We train an age group classifier with an accuracy of about 95% and show that using the results of the classifier to select age group –specific acoustic models for children and the elderly leads to considerable gains in automatic speech recognition performance, as compared with using acoustic models trained with young to middle-aged adults’ speech for recognising their speech, as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lee, S., Potamianos, A., Narayanan, S.: Acoustics of Children’s Speech: Developmental Changes of Temporal and Spectral Parameters. J. Acoust. Soc. Am. 10, 1455–1468 (1999)
Article Google Scholar
Huber, J.E., Stathopoulos, E.T., Curione, G.M., Ash, T.A., Johnson, K.: Formants of Children, Women and Men: The Effects of Vocal Intensity Variation. J. Acoust. Soc. Am. 106(3), 1532–1542 (1999)
Article Google Scholar
Xue, S., Hao, G.: Changes in the Human Vocal Tract Due to Aging and the Acoustic Correlates of Speech Production: A Pilot Study. Journal of Speech, Language, and Hearing Research 46, 689–701 (2003)
Article Google Scholar
Pellegrini, T., Hämäläinen, A., Boula de Mareüil, P., Tjalve, M., Trancoso, I., Candeias, S., Sales Dias, M., Braga, D.: A Corpus-Based Study of Elderly and Young Speakers of European Portuguese: Acoustic Correlates and Their Impact on Speech Recognition Performance. In: Interspeech, Lyon (2013)
Google Scholar
Narayanan, S., Potamianos, A.: Creating Conversational Interfaces for Children. IEEE Speech Audio Process. 10(2), 65–78 (2002)
Article Google Scholar
Strommen, E.F., Frome, F.S.: Talking Back to Big Bird: Preschool Users and a Simple Speech Recognition System. Educ. Technol. Res. Dev. 41(1), 5–16 (1993)
Article Google Scholar
Anderson, S., Liberman, N., Bernstein, E., Foster, S., Cate, E., Levin, B., Hudson, R.: Recognition of Elderly Speech and Voice-Driven Document Retrieval. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Phoenix, AZ, pp. 145–148 (1999)
Google Scholar
Takahashi, S., Morimoto, T., Maeda, S., Tsuruta, N.: Dialogue Experiment for Elderly People in Home Health Care System. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 418–423. Springer, Heidelberg (2003)
Chapter Google Scholar
Teixeira, V., Pires, C., Pinto, F., Freitas, J., Dias, M.S., Mendes Rodrigues, E.: Towards Elderly Social Integration using a Multimodal Human-computer Interface. In: Proc. of the 2nd International Living Usability Lab Workshop on AAL Latest Solutions, Trends and Applications, AAL 2012, Milan (2012)
Google Scholar
Wilpon, J.G., Jacobsen, C.N.: A Study of Speech Recognition for Children and Elderly. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Atlanta, GA, pp. 349–352 (1996)
Google Scholar
Potamianos, A., Narayanan, S.: Robust Recognition of Children’s Speech. IEEE Speech Audio Process 11(6), 603–615 (2003)
Article Google Scholar
Hämäläinen, A., Miguel Pinto, F., Rodrigues, S., Júdice, A., Morgado Silva, S., Calado, A., Sales Dias, M.: A Multimodal Educational Game for 3-10-year-old Children: Collecting and Automatically Recognising European Portuguese Children’s Speech. In: Workshop on Speech and Language Technology in Education, Grenoble (2013)
Google Scholar
Pellegrini, T., Trancoso, I., Hämäläinen, A., Calado, A., Sales Dias, M., Braga, D.: Impact of Age in ASR for the Elderly: Preliminary Experiments in European Portuguese. In: IberSPEECH, Madrid (2012)
Google Scholar
Vipperla, R., Renals, S., Frankel, J.: Longitudinal Study of ASR Performance on Ageing Voices. In: Interspeech, Brisbane, pp. 2550–2553 (2008)
Google Scholar
Batliner, A., Blomberg, M., D’Arcy, S., Elenius, D., Giuliani, D., Gerosa, M., Hacker, C., Russell, M., Steidl, S., Wong, M.: The PF_STAR Children’s Speech Corpus. In: Interspeech, Lisbon (2005)
Google Scholar
Hämäläinen, A., Rodrigues, S., Júdice, A., Silva, S.M., Calado, A., Pinto, F.M., Dias, M.S.: The CNG Corpus of European Portuguese Children’s Speech. In: Habernal, I. (ed.) TSD 2013. LNCS (LNAI), vol. 8082, pp. 544–551. Springer, Heidelberg (2013)
Google Scholar
Cucchiarini, C., Van Hamme, H., van Herwijnen, O., Smits, F.: JASMIN-CGN: Extension of the Spoken Dutch Corpus with Speech of Elderly People, Children and Non-natives in the Human-Machine Interaction Modality. In: Language Resources and Evaluation, Genoa (2006)
Google Scholar
Hämäläinen, A., Pinto, F., Sales Dias, M., Júdice, A., Freitas, J., Pires, C., Teixeira, V., Calado, A., Braga, D.: The First European Portuguese Elderly Speech Corpus. In: IberSPEECH, Madrid (2012)
Google Scholar
Hämäläinen, A., Avelar, J., Rodrigues, S., Sales Dias, M., Kolesiński, A., Fegyó, T., Nemeth, G., Csobánka, P., Lan Hing Ting, K., Hewson, D.: The EASR Corpora of European Portuguese, French, Hungarian and Polish Elderly Speech. In: Langauge Resources and Evaluation, Reykjavik (2014)
Google Scholar
Minematsu, N., Sekiguchi, M., Hirose, K.: Automatic Estimation of One’s Age with His/Her Speech Basedupon Acoustic Modeling Techniques of Speakers. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, pp. 137–140 (2002)
Google Scholar
Dobry, G., Hecht, R., Avigal, M., Zigel, Y.: Supervector Dimension Reduction for Efficient Speaker Age Estimation Based on the Acoustic Speech Signal. IEEE Transactions on Audio, Speech & Language Processing 19(7), 1975–1985 (2011)
Article Google Scholar
Bahari, M., McLaren, M., Van Hamme, H., Van Leeuwen, D.: Age Estimation from Telephone Speech Using i-Vectors. In: Interspeech, Portland, OR (2012)
Google Scholar
Neto, J., Martins, C., Meinedo, H., Almeida, L.: The Design of a Large Vocabulary Speech Corpus for Portuguese. In: European Conference on Speech Technology, Rhodes (1997)
Google Scholar
Eyben, F., Wollmer, M., Schuller, B.: openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor. In: ACM International Conference on Multimedia, Florence, pp. 1459–1462 (2010)
Google Scholar
Meinedo, H., Trancoso, I.: Age and Gender Detection in the I-DASH Project. ACM Trans. Speech Lang. Process. 7(4), 13 (2011)
Article Google Scholar
Schuller, B., Steidl, S., Batliner, A., Noeth, E., Vinciarelli, A., Burkhardt, F., van Son, R., Weninger, F., Eyben, F., Bocklet, T., Mohammadi, G., Weiss, B.: The Interspeech 2012 Speaker Trait Challenge. In: Interspeech 2012, Portland, OR (2012)
Google Scholar
Weninger, F., Eyben, F., Schuller, B.W., Mortillaro, M., Scherer, K.R.: On the Acoustics of Emotion in Audio: What Speech, Music and Sound Have in Common. Frontiers in Psychology, Emotion Science, Special Issue on Expression of Emotion in Music and Vocal Communication 4(Article ID 292), 1–12 (2013)
Google Scholar
Hall, M.: Correlation-Based Feature Subset Selection for Machine Learning. Hamilton, New Zealand (1998)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11 (2009)
Google Scholar
Platt, J.: Fast Training of Support Vector Machines Using Sequential Minimal Optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning (1998)
Google Scholar
Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO Algorithm for SVM Classifier Design. Neural Computation 13(3), 637–649 (2001)
Article MATH Google Scholar
Linville, S.E.: Vocal Aging. Singular, San Diego (2001)
Google Scholar
Microsoft Speech Platform Runtime (Version 11), http://www.microsoft.com/en-us/download/details.aspx?id=27225 (accessed March 25, 2013)

Download references

Author information

Authors and Affiliations

Microsoft Language Development Center & ISCTE, University Institute of Lisbon, Lisbon, Portugal
Annika Hämäläinen & Miguel Sales Dias
INESC-ID Lisboa, Lisbon, Portugal
Hugo Meinedo & Isabel Trancoso
Instituto Superio Técnico, Lisbon, Portugal
Isabel Trancoso
Microsoft & University of Washington, Seattle, WA, USA
Michael Tjalve
IRIT - Université Toulouse III - Paul Sabatier, Toulouse, France
Thomas Pellegrini

Authors

Annika Hämäläinen
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Meinedo
View author publications
You can also search for this author in PubMed Google Scholar
Michael Tjalve
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Pellegrini
View author publications
You can also search for this author in PubMed Google Scholar
Isabel Trancoso
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Sales Dias
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

FCHS, Universidade do Algarve, Campus de Gambelas,, 8005-139, Faro, Portugal
Jorge Baptista
INESC-ID Lisboa, Lisbon, Portugal
Nuno Mamede
IT-University of Coimbra, Coimbra, Portugal
Sara Candeias
USP-EACH, São Paulo-SP, Brazil
Ivandré Paraboni
USP-ICMC, Universidade de São Paulo, São Carlos, SP, Brazil
Thiago A. S. Pardo
SCC-ICMC, University of São Paulo, São Carlos, SP, Brazil
Maria das Graças Volpe Nunes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hämäläinen, A., Meinedo, H., Tjalve, M., Pellegrini, T., Trancoso, I., Dias, M.S. (2014). Improving Speech Recognition through Automatic Selection of Age Group – Specific Acoustic Models. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.d.G. (eds) Computational Processing of the Portuguese Language. PROPOR 2014. Lecture Notes in Computer Science(), vol 8775. Springer, Cham. https://doi.org/10.1007/978-3-319-09761-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-09761-9_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09760-2
Online ISBN: 978-3-319-09761-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics