Abstract
In this paper we evaluate the relevance of a perceptual spectral model for automatic motherese detection. We investigated various classification techniques (Gaussian Mixture Models, Support Vector Machines, Neural network, k-nearest neighbors) often used in emotion recognition. Classification experiments were carried out with short manually pre-segmented speech and motherese segments extracted from family home movies (with a mean duration of approximately 3s). Accuracy of around 86% were obtained when tested on speaker-independent speech data and 87.5% in the last study with speaker-dependent. We found that GMM trained with spectral feature MFCC gives the best score since it outperforms all the single classifiers. We also found that a fusion between classifiers that use spectral features and classifiers that use prosodic information usually increases the performance for discrimination between motherese and normal-directed speech (around 86% accuracy).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Andruski, J.E., Kuhl, P.K.: The acoustic structure of vowels in infant- and adult-directed speech. Paper presented at the Biannual Meeting of the Society for Research in Child Development, Washington, DC (April 1997)
Fernald, A., Kuhl, P.: Acoustic determinants of infant preference for Motherese speech. Infant Behavior and Development 10, 279–293 (1987)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. (2000)
Witten, E.F.I.H.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. The Kaufmann Series in Data Management Systems, Gray, J. Series (ed.) (October 1999)
Zwicker, E., Fastl, H.: Psychoacoustics: Facts and Models. Springer, Berlin (1999)
Fernald, A., Simon, T.: Expanded intonation contours in mothers speech to newborns. Developmental Psychology 20, 104–113 (1984)
Mahdhaoui, A., Chetouani, M., Zong, C.: Motherese Detection Based On Segmental and Supra-Segmental Features. In: International Conference on Pattern Recognition-ICPR, Tampa, Florida, USA, December 8-11 (2008)
Mahdhaoui, A., et al.: Automatic Motherese Detection for Face-to-Face Interaction Analysis. In: Esposito, A., et al. (eds.) Multimodal Signals: Cognitive and Algorithmic. Springer, Heidelberg (2009)
Reynolds, D.: ÒSpeaker identification and verification using Gaussian mixture speaker models. Ó Speech Communication 17, 91–108 (1995)
Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: Proceedings of Interspeech, pp. 2253–2256 (2007)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Zwicker, E.: Subdivision of the audible frequency range into critical bands. The Journal of the Acoustical Society of America 33 (February 1961)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mahdhaoui, A., Chetouani, M., Kessous, L. (2010). Time-Frequency Features Extraction for Infant Directed Speech Discrimination. In: Solé-Casals, J., Zaiats, V. (eds) Advances in Nonlinear Speech Processing. NOLISP 2009. Lecture Notes in Computer Science(), vol 5933. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11509-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-11509-7_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11508-0
Online ISBN: 978-3-642-11509-7
eBook Packages: Computer ScienceComputer Science (R0)