Skip to main content

Automatic Detection of Depressive States from Speech

  • Chapter
  • First Online:

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 69))

Abstract

This paper investigates the acoustical and perceptual speech features that differentiate a depressed individual from a healthy one. The speech data gathered was a collection from both healthy and depressed subjects in the Italian language, each comprising of a read and spontaneous narrative. The pre-processing of this dataset was done using Mel Frequency Cepstral Coefficient (MFCC). The speech samples were further processed using Principal Component Analysis (PCA) for correlation and dimensionality reduction. It was found that both groups differed with respect to the extracted speech features. To distinguish the depressed group from the healthy one on the basis the proposed speech processing algorithm the Self Organizing Map (SOM) algorithm was used. The clustering accuracy given by SOM’s was 80.67%.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Alpert, M., Pouget, E.R., Silva, R.R.: Reflections of depression in acoustic measures of the patient’s speech. J. Affect. Disord. 66, 59–69 (2001)

    Article  Google Scholar 

  2. Beale, M.H., Hagan, M.T., Demuth, H.B.: Neural network toolbox. User’s Guide, The Mathworks Inc., 7–39 (2010)

    Google Scholar 

  3. Cordasco, G., Esposito, M., Masucci, F., Riviello, M.T., Esposito, A., Chollet, G., Schlögl, S., Milhorat, P., Pelosi, G.: Assessing voice user interfaces: the assist system prototype. In: Proceedings of 5th IEEE international Conference on Cognitive Info Communications, Vietri sul Mare, 5–7 Nov, pp. 91–96 (2014)

    Google Scholar 

  4. Esposito, A., Esposito, A.M., Likforman-Sulem, L., Maldonato, N.M., Vinciarelli, A.: On the significance of speech pauses in depressive disorders: results on read and spontaneous narratives. In: Esposito, A., et al. (eds.) Springer SIST series on Recent Advances in Nonlinear Speech Processing, vol. 48, pp. 73–82 (2016)

    Google Scholar 

  5. Esposito, A., Jain, L.C.: Modeling social signals and contexts in robotic socially believable behaving systems. In Esposito, A., Jain, L.C. (eds.) Toward Robotic Socially Believable Behaving Systems Volume II—“Modeling Social Signals” Springer International Publishing Switzerland, ISRL series 106, pp. 5–13 (2016)

    Google Scholar 

  6. Esposito, A., Esposito, A.M., Vogel, C.: Needs and challenges in human computer interaction for processing social emotional information. Pattern Recogn. Lett. 66, 41–51 (2015)

    Article  Google Scholar 

  7. Esposito, A., Esposito, A.M.: On the recognition of emotional vocal expressions: motivations for an holistic approach. Cogn. Process. J. 13(2), 541–550 (2012)

    Article  Google Scholar 

  8. Esposito, A.M., D’Auria, L., Angelillo, A, Giudicepietro, F., Martini, M.: Predictive analysis of the seismicity level at Campi Flegrei volcano using a data-driven approach. In: Bassis, et al. (eds.) Recent Advances of Neural Network Models and Applications, Springer Series in Smart Innovation, Systems and Technologies, vol. 19, pp. 133–145 (2014)

    Google Scholar 

  9. Esposito, A.M., D’Auria, L., Angelillo, A, Giudicepietro, F., Martini, M.: Waveform variation of the explosion-quakes as a function of the eruptive activity at Stromboli volcano. In: Bassis, et al. (eds.) Neural Nets and Surroundings, Springer Series in Smart Innovation, Systems and Technologies, vol. 19, pp. 111–119 (2013)

    Google Scholar 

  10. Gupta, S., Jaafar, J., Ahmad, W.F., Bansal, A.: Feature extraction using MFCC. Signal Image Process. (SIPIJ) 4(4), 101–108 (2013)

    Google Scholar 

  11. Ghisi, M., Flebus, G.B., Montano, A., Sanavio, E., Sica, C.: Beck Depression Inventory-II. Manuale Italiano. Firenze, Organizzazioni Speciali (2006)

    Google Scholar 

  12. Jackson, J.E.: A User’s Guide to Principal Components, p. 592. Wiley (1991)

    Google Scholar 

  13. Jolliffe, I.T.: Principal Component Analysis, 2nd edn. pp. 299–316. Springer (2002)

    Google Scholar 

  14. Kakumanu, P., Esposito, A., Gutierrez-Osuna, R., Garcia, O.N.: A comparison of acoustic coding models for speech-driven facial animation. Speech Commun. 48(6), 598–615 (2006)

    Article  Google Scholar 

  15. Kiss, G.C., Tulics, M.G., Sztahó, D., Esposito, A., Vicsi, K.: Language independent detection possibilities of depression by speech. In: Esposito, A., et al. (eds.) Springer SIST series on Recent Advances in Nonlinear Speech Processing, vol. 48, pp. 103–114 (2016)

    Google Scholar 

  16. Kopparapu, K.S., Laxminarayana, M.: Choice of Mel filter bank in computing MFCC of a resampled speech. In: IEEE International Conference on Information Sciences Signal Processing and their Applications (ISSPA 2010), Malaysia 10–13 May, pp. 121–124 (2010)

    Google Scholar 

  17. Maldonato, N.M., Dell’Orco, S.: Making decision under uncertainty, emotions, risk and biases. In: Bassis, S., Esposito, A., Morabito, F.C. (eds.) Advances in Neural Networks: Computational and Theoretical Issues, SIST Series 37, pp. 293–302. Springer International Publishing Switzerland (2015)

    Google Scholar 

  18. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval: Evaluation of Clustering, pp. 349–356. Cambridge University Press (2008)

    Google Scholar 

  19. Marazziti, D., Consoli, G., Picchetti, M., Carlini, M., Faravelli, L.: Cognitive impairment in major depression. Eur. J. Pharmacol. 626, 83–86 (2010)

    Article  Google Scholar 

  20. Moore, E., Clements, M., Peifer, J., Weisser L.: Investigating the role of glottal parameters in classifying clinical depression. In: Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol 3, pp. 2849–2852 (2003)

    Google Scholar 

  21. Moore, E., Clements, M.A., Peifer, J.W., Weisser, L.: Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE Trans. Biomed. Eng. 55, 96–107 (2008)

    Google Scholar 

  22. Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) techniques. J. Comput. 2(3), 138–143 (2010)

    Google Scholar 

  23. Mundt, J.C., Snyder, P.J., Cannizzaro, M.S., Chappie, K., Geralts, D.S.: Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology. J. Neurolinguist. 20, 50–64 (2007)

    Article  Google Scholar 

  24. Mundt, J.C., Vogel, A.P., Feltner, D.E., Lenderking, W.R.: Vocal acoustic biomarkers of depression severity and treatment response. Biol. Psychiatry 72, 580–587 (2012)

    Article  Google Scholar 

  25. Rosser, B.A., Vowles, K.E., Keogh, E., Eccleston, C., Mountain, G.A.: Technologically-assisted behaviour change: a systematic review of studies of novel technologies for the management of chronic illness. Telemed. Telecare 15(7), 327–338 (2009)

    Article  Google Scholar 

  26. Tiwari, V.: MFCC and its applications in speaker recognition. Int. J. Emerg. Technol. 19–22 (2010)

    Google Scholar 

  27. Troncone, A., Palumbo, D., Esposito, A.: Mood effects on the decoding of emotional voices. In: Bassis, S., et al. (eds.) Recent Advances of Neural Network Models and Applications, SIST 26, pp. 325–332. International Publishing Switzerland (2014)

    Google Scholar 

  28. Vesanto, J., Alhoniemi, E.: Clustering of the self-organizing map. IEEE Trans. Neural Netw. 11(3), 586–600 (2000)

    Article  Google Scholar 

  29. Viszlay, P., Pleva, M., Juhár, J.: Dimension reduction with principal component analysis applied to speech supervectors. J. Electr. Electron. Eng. 4(1), 245–250 (2011)

    Google Scholar 

  30. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.4.1). Engineering Department, Cambridge University, pp. 56–80 (2006)

    Google Scholar 

Download references

Acknowledgements

The patients, healthy subjects (with typical speech) and doctors (psychiatrists) are acknowledged for their involvement and contribution towards this research. The International Institute for Advanced Scientific Studies (IIASS) and Professor Ferdinando Mancini (President, IIASS), is acknowledged for having supported the first author during her internship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Esposito .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Cite this chapter

Mendiratta, A. et al. (2018). Automatic Detection of Depressive States from Speech. In: Esposito, A., Faudez-Zanuy, M., Morabito, F., Pasero, E. (eds) Multidisciplinary Approaches to Neural Computing. Smart Innovation, Systems and Technologies, vol 69. Springer, Cham. https://doi.org/10.1007/978-3-319-56904-8_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56904-8_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56903-1

  • Online ISBN: 978-3-319-56904-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics