Skip to main content

Time-Frequency Features Extraction for Infant Directed Speech Discrimination

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5933))

Abstract

In this paper we evaluate the relevance of a perceptual spectral model for automatic motherese detection. We investigated various classification techniques (Gaussian Mixture Models, Support Vector Machines, Neural network, k-nearest neighbors) often used in emotion recognition. Classification experiments were carried out with short manually pre-segmented speech and motherese segments extracted from family home movies (with a mean duration of approximately 3s). Accuracy of around 86% were obtained when tested on speaker-independent speech data and 87.5% in the last study with speaker-dependent. We found that GMM trained with spectral feature MFCC gives the best score since it outperforms all the single classifiers. We also found that a fusion between classifiers that use spectral features and classifiers that use prosodic information usually increases the performance for discrimination between motherese and normal-directed speech (around 86% accuracy).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andruski, J.E., Kuhl, P.K.: The acoustic structure of vowels in infant- and adult-directed speech. Paper presented at the Biannual Meeting of the Society for Research in Child Development, Washington, DC (April 1997)

    Google Scholar 

  2. Fernald, A., Kuhl, P.: Acoustic determinants of infant preference for Motherese speech. Infant Behavior and Development 10, 279–293 (1987)

    Article  Google Scholar 

  3. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm/

  4. Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. (2000)

    Google Scholar 

  5. Witten, E.F.I.H.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. The Kaufmann Series in Data Management Systems, Gray, J. Series (ed.) (October 1999)

    Google Scholar 

  6. Zwicker, E., Fastl, H.: Psychoacoustics: Facts and Models. Springer, Berlin (1999)

    Google Scholar 

  7. Fernald, A., Simon, T.: Expanded intonation contours in mothers speech to newborns. Developmental Psychology 20, 104–113 (1984)

    Article  Google Scholar 

  8. Mahdhaoui, A., Chetouani, M., Zong, C.: Motherese Detection Based On Segmental and Supra-Segmental Features. In: International Conference on Pattern Recognition-ICPR, Tampa, Florida, USA, December 8-11 (2008)

    Google Scholar 

  9. Mahdhaoui, A., et al.: Automatic Motherese Detection for Face-to-Face Interaction Analysis. In: Esposito, A., et al. (eds.) Multimodal Signals: Cognitive and Algorithmic. Springer, Heidelberg (2009)

    Google Scholar 

  10. Reynolds, D.: ÒSpeaker identification and verification using Gaussian mixture speaker models. Ó Speech Communication 17, 91–108 (1995)

    Article  Google Scholar 

  11. Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: Proceedings of Interspeech, pp. 2253–2256 (2007)

    Google Scholar 

  12. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    MATH  Google Scholar 

  13. Zwicker, E.: Subdivision of the audible frequency range into critical bands. The Journal of the Acoustical Society of America 33 (February 1961)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mahdhaoui, A., Chetouani, M., Kessous, L. (2010). Time-Frequency Features Extraction for Infant Directed Speech Discrimination. In: Solé-Casals, J., Zaiats, V. (eds) Advances in Nonlinear Speech Processing. NOLISP 2009. Lecture Notes in Computer Science(), vol 5933. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11509-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-11509-7_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-11508-0

  • Online ISBN: 978-3-642-11509-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics