Skip to main content

Speech Analytics for Medical Applications

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2018)

Abstract

Speech has the potential to provide a rich bio-marker for health, allowing a non-invasive route to early diagnosis and monitoring of a range of conditions related to human physiology and cognition. With the rise of speech related machine learning applications over the last decade, there has been a growing interest in developing speech based tools that perform non-invasive diagnosis. This talk covers two aspects related to this growing trend. One is the collection of large in-the-wild multimodal datasets in which the speech of the subject is affected by certain medical conditions. Our mining effort has been focused on video blogs (vlogs), and explores audio, video, text and metadata cues, in order to retrieve vlogs that include a single speaker which, at some point, admits that he/she is currently affected by a given disease. The second aspect is patient privacy. In this context, we explore recent developments in cryptography and, in particular in Fully Homomorphic Encryption, to develop an encrypted version of a neural network trained with unencrypted data, in order to produce encrypted predictions of health-related labels. As a proof-of-concept, we have selected two target diseases: Cold and Depression, to show our results and discuss these two aspects.

This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with references UID/CEC/50021/2013, and SFRH/BD/103402/2014.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.research.ibm.com/5-in-5/mental-health/.

  2. 2.

    The WSM corpus also includes a subset for Parkinson’s Disease, which we excluded for two reasons: space concerns, and the fact that the corresponding lab dataset is aimed at a regression, and not classification task.

References

  1. Boufounos, P., Rane, S.: Secure binary embeddings for privacy preserving nearest neighbors. In: International Workshop on Information Forensics and Security (WIFS) (2011)

    Google Scholar 

  2. Chabanne, H., de Wargny, A., Milgram, J., Morel, C., et al.: Privacy-preserving classification on deep neural network. IACR Cryptology ePrint Archive 2017, 35 (2017)

    Google Scholar 

  3. Chollet, F., et al.: Keras (2015). https://github.com/keras-team/keras

  4. Correia, J., Raj, B., Trancoso, I., Teixeira, F.: Mining multimodal repositories for speech affecting diseases. In: Interspeech (2018)

    Google Scholar 

  5. Cummins, N., Scherer, S., Krajewski, J., Schnieder, S., Epps, J., Quatieri, T.F.: A review of depression and suicide risk assessment using speech analysis. Speech Commun. 71, 10–49 (2015)

    Article  Google Scholar 

  6. Degottex, G., Kane, J., Drugman, T., Raitio, T., Scherer, S.: COVAREP - a collaborative voice analysis repository for speech technologies. In: ICASSP, pp. 960–964, May 2014. https://doi.org/10.1109/ICASSP.2014.6853739

  7. Dias, M., Abad, A., Trancoso, I.: Exploring hashing and cryptonet based approaches for privacy-preserving speech emotion recognition. In: ICASSP. IEEE (2018)

    Google Scholar 

  8. Dibazar, A.A., Narayanan, S., Berger, T.W.: Feature analysis for automatic detection of pathological speech. In: 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society EMBS/BMES Conference, vol. 1, pp. 182–183. IEEE (2002)

    Google Scholar 

  9. Eyben, F., Scherer, K., Schuller, B., Sundberg, J., et al.: The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016)

    Article  Google Scholar 

  10. Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. IACR Cryptology ePrint Archive 2012, 144 (2012). Informal publication

    Google Scholar 

  11. Geitgey, A.: Facerecog (2017). https://github.com/ageitgey/face_recognition

  12. Gilad-Bachrach, R., Dowlin, N., Laine, K., et al.: CryptoNets: applying neural networks to encrypted data with high throughput and accuracy. In: ICML. JMLR Workshop and Conference Proceedings, vol. 48, pp. 201–210 (2016)

    Google Scholar 

  13. Hesamifard, E., Takabi, H., Ghasemi, M.: CryptoDL: deep neural networks over encrypted data. CoRR abs/1711.05189 (2017)

    Google Scholar 

  14. Lopez-de Ipiña, K., et al.: On automatic diagnosis of Alzheimer’s disease based on spontaneous speech analysis and emotional temperature. Cogn. Comput. 7(1), 44–55 (2015)

    Article  Google Scholar 

  15. López-de Ipiña, K., et al.: On the selection of non-invasive methods based on speech analysis oriented to automatic Alzheimer disease diagnosis. Sensors 13(5), 6730–6745 (2013)

    Article  Google Scholar 

  16. Kroenke, K., Strine, T.W., Spitzer, R.L., Williams, J.B., Berry, J.T., Mokdad, A.H.: The PHQ-8 as a measure of current depression in the general population. J. Affect Disord 114(1–3), 163–173 (2009)

    Article  Google Scholar 

  17. Laine, K., Chen, H., Player, R.: Simple encrypted arithmetic library - SEAL v2.3.0. Technical report, Microsoft, December 2017. https://www.microsoft.com/en-us/research/publication/simple-encrypted-arithmetic-library-v2-3-0/

  18. Orozco-Arroyave, J.R., et al.: Characterization methods for the detection of multiple voice disorders: neurological, functional, and laryngeal diseases. IEEE J. Biomed. Health Inform. 19(6), 1820–1828 (2015)

    Article  Google Scholar 

  19. Pathak, M.A., Raj, B.: Privacy-preserving speaker verification and identification using gaussian mixture models. IEEE Trans. Audio Speech Lang. Process. 21(2), 397–406 (2013). https://doi.org/10.1109/TASL.2012.2215602

    Article  Google Scholar 

  20. Rivest, R.L., Adleman, L., Dertouzos, M.L.: On data banks and privacy homomorphisms. Found. Secure Comput. 169–179 (1978)

    Google Scholar 

  21. Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T., Meignier, S.: An open-source state-of-the-art toolbox for broadcast news diarization. In: Interspeech (2013)

    Google Scholar 

  22. Schuller, B., et al.: The Interspeech 2017 computational paralinguistics challenge: addressee, cold & snoring. In: Interspeech (2017)

    Google Scholar 

  23. Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP 2013, pp. 1631–1642 (2013)

    Google Scholar 

  24. Teixeira, F., Abad, A., Trancoso, I.: Patient privacy in paralinguistic tasks. In: Interspeech (2018)

    Google Scholar 

  25. Valstar, M.F., et al.: AVEC 2016 - depression, mood, and emotion recognition workshop and challenge. CoRR abs/1605.01600 (2016). http://arxiv.org/abs/1605.01600

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Isabel Trancoso .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Trancoso, I., Correia, J., Teixeira, F., Raj, B., Abad, A. (2018). Speech Analytics for Medical Applications. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2018. Lecture Notes in Computer Science(), vol 11107. Springer, Cham. https://doi.org/10.1007/978-3-030-00794-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00794-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00793-5

  • Online ISBN: 978-3-030-00794-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics