Time-Frequency Features Extraction for Infant Directed Speech Discrimination

Mahdhaoui, Ammar; Chetouani, Mohamed; Kessous, Loic

doi:10.1007/978-3-642-11509-7_16

Ammar Mahdhaoui²¹,
Mohamed Chetouani²¹ &
Loic Kessous²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5933))

Included in the following conference series:

International Conference on Nonlinear Speech Processing

581 Accesses
3 Citations

Abstract

In this paper we evaluate the relevance of a perceptual spectral model for automatic motherese detection. We investigated various classification techniques (Gaussian Mixture Models, Support Vector Machines, Neural network, k-nearest neighbors) often used in emotion recognition. Classification experiments were carried out with short manually pre-segmented speech and motherese segments extracted from family home movies (with a mean duration of approximately 3s). Accuracy of around 86% were obtained when tested on speaker-independent speech data and 87.5% in the last study with speaker-dependent. We found that GMM trained with spectral feature MFCC gives the best score since it outperforms all the single classifiers. We also found that a fusion between classifiers that use spectral features and classifiers that use prosodic information usually increases the performance for discrimination between motherese and normal-directed speech (around 86% accuracy).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andruski, J.E., Kuhl, P.K.: The acoustic structure of vowels in infant- and adult-directed speech. Paper presented at the Biannual Meeting of the Society for Research in Child Development, Washington, DC (April 1997)
Google Scholar
Fernald, A., Kuhl, P.: Acoustic determinants of infant preference for Motherese speech. Infant Behavior and Development 10, 279–293 (1987)
Article Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. (2000)
Google Scholar
Witten, E.F.I.H.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. The Kaufmann Series in Data Management Systems, Gray, J. Series (ed.) (October 1999)
Google Scholar
Zwicker, E., Fastl, H.: Psychoacoustics: Facts and Models. Springer, Berlin (1999)
Google Scholar
Fernald, A., Simon, T.: Expanded intonation contours in mothers speech to newborns. Developmental Psychology 20, 104–113 (1984)
Article Google Scholar
Mahdhaoui, A., Chetouani, M., Zong, C.: Motherese Detection Based On Segmental and Supra-Segmental Features. In: International Conference on Pattern Recognition-ICPR, Tampa, Florida, USA, December 8-11 (2008)
Google Scholar
Mahdhaoui, A., et al.: Automatic Motherese Detection for Face-to-Face Interaction Analysis. In: Esposito, A., et al. (eds.) Multimodal Signals: Cognitive and Algorithmic. Springer, Heidelberg (2009)
Google Scholar
Reynolds, D.: ÒSpeaker identification and verification using Gaussian mixture speaker models. Ó Speech Communication 17, 91–108 (1995)
Article Google Scholar
Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: Proceedings of Interspeech, pp. 2253–2256 (2007)
Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)
MATH Google Scholar
Zwicker, E.: Subdivision of the audible frequency range into critical bands. The Journal of the Acoustical Society of America 33 (February 1961)
Google Scholar

Download references

Author information

Authors and Affiliations

UPMC Univ Paris 06, F-75005, Paris, France CNRS, UMR 7222 ISIR, Institut des Systèmes Intelligents et de Robotique, F-75005, Paris, France
Ammar Mahdhaoui, Mohamed Chetouani & Loic Kessous

Authors

Ammar Mahdhaoui
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Chetouani
View author publications
You can also search for this author in PubMed Google Scholar
Loic Kessous
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Escola Politecnica Superior, Universidat de Vic, c/. Sagrada Familia, 7, 08500, Vic (Barcelona), Spain
Jordi Solé-Casals
Department of Computer Science, Escola Politecnica Superior, Universitat de Vic, c./. Sagrada Familia, 7, 08500, Vic (Barcelona), Spain
Vladimir Zaiats

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mahdhaoui, A., Chetouani, M., Kessous, L. (2010). Time-Frequency Features Extraction for Infant Directed Speech Discrimination. In: Solé-Casals, J., Zaiats, V. (eds) Advances in Nonlinear Speech Processing. NOLISP 2009. Lecture Notes in Computer Science(), vol 5933. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11509-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-11509-7_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11508-0
Online ISBN: 978-3-642-11509-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics