Speech Analytics for Medical Applications

  • Isabel TrancosoEmail author
  • Joana Correia
  • Francisco Teixeira
  • Bhiksha Raj
  • Alberto Abad
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11107)


Speech has the potential to provide a rich bio-marker for health, allowing a non-invasive route to early diagnosis and monitoring of a range of conditions related to human physiology and cognition. With the rise of speech related machine learning applications over the last decade, there has been a growing interest in developing speech based tools that perform non-invasive diagnosis. This talk covers two aspects related to this growing trend. One is the collection of large in-the-wild multimodal datasets in which the speech of the subject is affected by certain medical conditions. Our mining effort has been focused on video blogs (vlogs), and explores audio, video, text and metadata cues, in order to retrieve vlogs that include a single speaker which, at some point, admits that he/she is currently affected by a given disease. The second aspect is patient privacy. In this context, we explore recent developments in cryptography and, in particular in Fully Homomorphic Encryption, to develop an encrypted version of a neural network trained with unencrypted data, in order to produce encrypted predictions of health-related labels. As a proof-of-concept, we have selected two target diseases: Cold and Depression, to show our results and discuss these two aspects.


Pathological speech Data mining Cryptography 


  1. 1.
    Boufounos, P., Rane, S.: Secure binary embeddings for privacy preserving nearest neighbors. In: International Workshop on Information Forensics and Security (WIFS) (2011)Google Scholar
  2. 2.
    Chabanne, H., de Wargny, A., Milgram, J., Morel, C., et al.: Privacy-preserving classification on deep neural network. IACR Cryptology ePrint Archive 2017, 35 (2017)Google Scholar
  3. 3.
    Chollet, F., et al.: Keras (2015).
  4. 4.
    Correia, J., Raj, B., Trancoso, I., Teixeira, F.: Mining multimodal repositories for speech affecting diseases. In: Interspeech (2018)Google Scholar
  5. 5.
    Cummins, N., Scherer, S., Krajewski, J., Schnieder, S., Epps, J., Quatieri, T.F.: A review of depression and suicide risk assessment using speech analysis. Speech Commun. 71, 10–49 (2015)CrossRefGoogle Scholar
  6. 6.
    Degottex, G., Kane, J., Drugman, T., Raitio, T., Scherer, S.: COVAREP - a collaborative voice analysis repository for speech technologies. In: ICASSP, pp. 960–964, May 2014.
  7. 7.
    Dias, M., Abad, A., Trancoso, I.: Exploring hashing and cryptonet based approaches for privacy-preserving speech emotion recognition. In: ICASSP. IEEE (2018)Google Scholar
  8. 8.
    Dibazar, A.A., Narayanan, S., Berger, T.W.: Feature analysis for automatic detection of pathological speech. In: 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society EMBS/BMES Conference, vol. 1, pp. 182–183. IEEE (2002)Google Scholar
  9. 9.
    Eyben, F., Scherer, K., Schuller, B., Sundberg, J., et al.: The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016)CrossRefGoogle Scholar
  10. 10.
    Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. IACR Cryptology ePrint Archive 2012, 144 (2012). Informal publicationGoogle Scholar
  11. 11.
    Geitgey, A.: Facerecog (2017).
  12. 12.
    Gilad-Bachrach, R., Dowlin, N., Laine, K., et al.: CryptoNets: applying neural networks to encrypted data with high throughput and accuracy. In: ICML. JMLR Workshop and Conference Proceedings, vol. 48, pp. 201–210 (2016)Google Scholar
  13. 13.
    Hesamifard, E., Takabi, H., Ghasemi, M.: CryptoDL: deep neural networks over encrypted data. CoRR abs/1711.05189 (2017)Google Scholar
  14. 14.
    Lopez-de Ipiña, K., et al.: On automatic diagnosis of Alzheimer’s disease based on spontaneous speech analysis and emotional temperature. Cogn. Comput. 7(1), 44–55 (2015)CrossRefGoogle Scholar
  15. 15.
    López-de Ipiña, K., et al.: On the selection of non-invasive methods based on speech analysis oriented to automatic Alzheimer disease diagnosis. Sensors 13(5), 6730–6745 (2013)CrossRefGoogle Scholar
  16. 16.
    Kroenke, K., Strine, T.W., Spitzer, R.L., Williams, J.B., Berry, J.T., Mokdad, A.H.: The PHQ-8 as a measure of current depression in the general population. J. Affect Disord 114(1–3), 163–173 (2009)CrossRefGoogle Scholar
  17. 17.
    Laine, K., Chen, H., Player, R.: Simple encrypted arithmetic library - SEAL v2.3.0. Technical report, Microsoft, December 2017.
  18. 18.
    Orozco-Arroyave, J.R., et al.: Characterization methods for the detection of multiple voice disorders: neurological, functional, and laryngeal diseases. IEEE J. Biomed. Health Inform. 19(6), 1820–1828 (2015)CrossRefGoogle Scholar
  19. 19.
    Pathak, M.A., Raj, B.: Privacy-preserving speaker verification and identification using gaussian mixture models. IEEE Trans. Audio Speech Lang. Process. 21(2), 397–406 (2013). Scholar
  20. 20.
    Rivest, R.L., Adleman, L., Dertouzos, M.L.: On data banks and privacy homomorphisms. Found. Secure Comput. 169–179 (1978)Google Scholar
  21. 21.
    Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T., Meignier, S.: An open-source state-of-the-art toolbox for broadcast news diarization. In: Interspeech (2013)Google Scholar
  22. 22.
    Schuller, B., et al.: The Interspeech 2017 computational paralinguistics challenge: addressee, cold & snoring. In: Interspeech (2017)Google Scholar
  23. 23.
    Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP 2013, pp. 1631–1642 (2013)Google Scholar
  24. 24.
    Teixeira, F., Abad, A., Trancoso, I.: Patient privacy in paralinguistic tasks. In: Interspeech (2018)Google Scholar
  25. 25.
    Valstar, M.F., et al.: AVEC 2016 - depression, mood, and emotion recognition workshop and challenge. CoRR abs/1605.01600 (2016).

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.INESC-ID/Instituto Superior TécnicoUniversity of LisbonLisbonPortugal
  2. 2.Carnegie Mellon UniversityPittsburghUSA

Personalised recommendations