Improving Articulatory Feature and Phoneme Recognition Using Multitask Learning

Rasipuram, Ramya; Magimai-Doss, Mathew

doi:10.1007/978-3-642-21735-7_37

Ramya Rasipuram^19,20 &
Mathew Magimai-Doss¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6791))

Included in the following conference series:

International Conference on Artificial Neural Networks

7181 Accesses
7 Citations

Abstract

Speech sounds can be characterized by articulatory features. Articulatory features are typically estimated using a set of multilayer perceptrons (MLPs), i.e., a separate MLP is trained for each articulatory feature. In this paper, we investigate multitask learning (MTL) approach for joint estimation of articulatory features with and without phoneme classification as subtask. Our studies show that MTL MLP can estimate articulatory features compactly and efficiently by learning the inter-feature dependencies through a common hidden layer representation. Furthermore, adding phoneme as subtask while estimating articulatory features improves both articulatory feature estimation and phoneme recognition. On TIMIT phoneme recognition task, articulatory feature posterior probabilities obtained by MTL MLP achieve a phoneme recognition accuracy of 73.2%, while the phoneme posterior probabilities achieve an accuracy of 74.0%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aradilla, G., Vepa, J., Bourlard, H.: An Acoustic Model Based on Kullback-Leibler Divergence for Posterior Features. In: Proc. of ICASSP, pp. 657–660 (2007)
Google Scholar
Caruana, R.: Multitask Learning. Machine Learning 28(1), 41–75 (1997)
Article MathSciNet Google Scholar
Frankel, J., Çetin, O., Morgan, N.: Transfer Learning for Tandem ASR Feature Extraction. In: Proceedings of MLMI, pp. 227–236 (2007)
Google Scholar
Frankel, J., Magimai-Doss, M., King, S., Livescu, K., Çetin, O.: Articulatory Feature Classifiers Trained on 2000 hours of Telephone Speech. In: Proc. of Interspeech (2007)
Google Scholar
Frankel, J., Wester, M., King, S.: Articulatory feature recognition using dynamic Bayesian networks. Computer Speech & Language 21(4), 620–640 (2007)
Article Google Scholar
Hosom, J.P.: Speaker-independent phoneme alignment using transition-dependent states. Speech Communication 51, 352–368 (2009)
Article Google Scholar
King, S., Taylor, P.: Detection of Phonological Features in Continuous Speech using Neural Networks. Computer Speech and Language 14(4), 333–353 (2000)
Article Google Scholar
Parveen, S., Green, P.: Multitask Learning in Connectionist Robust ASR using Recurrent Neural Networks. In: Proceedings of EUROSPEECH, pp. 1813–1816 (2003)
Google Scholar
Pinto, J., Sivaram, G., Magimai-Doss, M., Hermansky, H., Bourlard, H.: Analysis of MLP based Hierarchical Phoneme Posterior Probability Estimator. IEEE Trans. on Audio, Speech, and Language Processing 19(2), 225–241 (2011)
Article Google Scholar
Rasipuram, R., Magimai.-Doss, M.: Integrating Articulatory Features using Kullback-Leibler Divergence based Acoustic Model for Phoneme Recognition. In: Proc. of ICASSP (2011)
Google Scholar
Richmond, K.: A Multitask Learning Perspective on Acoustic-Articulatory Inversion. In: Proc. of Interspeech (2007)
Google Scholar
Stadermann, J., Koska, W., Rigoll, G.: Multi-task Learning Strategies for a Recurrent Neural Net in a Hybrid Tied-Posteriors Acoustic Model. In: Proc. of Interspeech, pp. 2993–2996 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Idiap Research Institute, Martigny, Switzerland
Ramya Rasipuram & Mathew Magimai-Doss
Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland
Ramya Rasipuram

Authors

Ramya Rasipuram
View author publications
You can also search for this author in PubMed Google Scholar
Mathew Magimai-Doss
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information and Computer Science, Aalto University School of Science, P.O. Box 15400, 00076, Aalto, Finland
Timo Honkela & Samuel Kaski &
School of Physics, Astronomy and Informatics, Department of Informatics, Nicolaus Copernicus University, ul. Grudziadzka 5, 87-100, Torun, Poland
Włodzisław Duch
Department of Statistical Science, University College London, 1-19 Torrington Place, WC1E 7HB, London, UK
Mark Girolami

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rasipuram, R., Magimai-Doss, M. (2011). Improving Articulatory Feature and Phoneme Recognition Using Multitask Learning. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2011. ICANN 2011. Lecture Notes in Computer Science, vol 6791. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21735-7_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-21735-7_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21734-0
Online ISBN: 978-3-642-21735-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics