Speech Analytics for Medical Applications

Trancoso, Isabel; Correia, Joana; Teixeira, Francisco; Raj, Bhiksha; Abad, Alberto

doi:10.1007/978-3-030-00794-2_3

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11107))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1428 Accesses
1 Citations
1 Altmetric

Abstract

Speech has the potential to provide a rich bio-marker for health, allowing a non-invasive route to early diagnosis and monitoring of a range of conditions related to human physiology and cognition. With the rise of speech related machine learning applications over the last decade, there has been a growing interest in developing speech based tools that perform non-invasive diagnosis. This talk covers two aspects related to this growing trend. One is the collection of large in-the-wild multimodal datasets in which the speech of the subject is affected by certain medical conditions. Our mining effort has been focused on video blogs (vlogs), and explores audio, video, text and metadata cues, in order to retrieve vlogs that include a single speaker which, at some point, admits that he/she is currently affected by a given disease. The second aspect is patient privacy. In this context, we explore recent developments in cryptography and, in particular in Fully Homomorphic Encryption, to develop an encrypted version of a neural network trained with unencrypted data, in order to produce encrypted predictions of health-related labels. As a proof-of-concept, we have selected two target diseases: Cold and Depression, to show our results and discuss these two aspects.

This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with references UID/CEC/50021/2013, and SFRH/BD/103402/2014.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.research.ibm.com/5-in-5/mental-health/.
2.
The WSM corpus also includes a subset for Parkinson’s Disease, which we excluded for two reasons: space concerns, and the fact that the corresponding lab dataset is aimed at a regression, and not classification task.

References

Boufounos, P., Rane, S.: Secure binary embeddings for privacy preserving nearest neighbors. In: International Workshop on Information Forensics and Security (WIFS) (2011)
Google Scholar
Chabanne, H., de Wargny, A., Milgram, J., Morel, C., et al.: Privacy-preserving classification on deep neural network. IACR Cryptology ePrint Archive 2017, 35 (2017)
Google Scholar
Chollet, F., et al.: Keras (2015). https://github.com/keras-team/keras
Correia, J., Raj, B., Trancoso, I., Teixeira, F.: Mining multimodal repositories for speech affecting diseases. In: Interspeech (2018)
Google Scholar
Cummins, N., Scherer, S., Krajewski, J., Schnieder, S., Epps, J., Quatieri, T.F.: A review of depression and suicide risk assessment using speech analysis. Speech Commun. 71, 10–49 (2015)
Article Google Scholar
Degottex, G., Kane, J., Drugman, T., Raitio, T., Scherer, S.: COVAREP - a collaborative voice analysis repository for speech technologies. In: ICASSP, pp. 960–964, May 2014. https://doi.org/10.1109/ICASSP.2014.6853739
Dias, M., Abad, A., Trancoso, I.: Exploring hashing and cryptonet based approaches for privacy-preserving speech emotion recognition. In: ICASSP. IEEE (2018)
Google Scholar
Dibazar, A.A., Narayanan, S., Berger, T.W.: Feature analysis for automatic detection of pathological speech. In: 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society EMBS/BMES Conference, vol. 1, pp. 182–183. IEEE (2002)
Google Scholar
Eyben, F., Scherer, K., Schuller, B., Sundberg, J., et al.: The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016)
Article Google Scholar
Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. IACR Cryptology ePrint Archive 2012, 144 (2012). Informal publication
Google Scholar
Geitgey, A.: Facerecog (2017). https://github.com/ageitgey/face_recognition
Gilad-Bachrach, R., Dowlin, N., Laine, K., et al.: CryptoNets: applying neural networks to encrypted data with high throughput and accuracy. In: ICML. JMLR Workshop and Conference Proceedings, vol. 48, pp. 201–210 (2016)
Google Scholar
Hesamifard, E., Takabi, H., Ghasemi, M.: CryptoDL: deep neural networks over encrypted data. CoRR abs/1711.05189 (2017)
Google Scholar
Lopez-de Ipiña, K., et al.: On automatic diagnosis of Alzheimer’s disease based on spontaneous speech analysis and emotional temperature. Cogn. Comput. 7(1), 44–55 (2015)
Article Google Scholar
López-de Ipiña, K., et al.: On the selection of non-invasive methods based on speech analysis oriented to automatic Alzheimer disease diagnosis. Sensors 13(5), 6730–6745 (2013)
Article Google Scholar
Kroenke, K., Strine, T.W., Spitzer, R.L., Williams, J.B., Berry, J.T., Mokdad, A.H.: The PHQ-8 as a measure of current depression in the general population. J. Affect Disord 114(1–3), 163–173 (2009)
Article Google Scholar
Laine, K., Chen, H., Player, R.: Simple encrypted arithmetic library - SEAL v2.3.0. Technical report, Microsoft, December 2017. https://www.microsoft.com/en-us/research/publication/simple-encrypted-arithmetic-library-v2-3-0/
Orozco-Arroyave, J.R., et al.: Characterization methods for the detection of multiple voice disorders: neurological, functional, and laryngeal diseases. IEEE J. Biomed. Health Inform. 19(6), 1820–1828 (2015)
Article Google Scholar
Pathak, M.A., Raj, B.: Privacy-preserving speaker verification and identification using gaussian mixture models. IEEE Trans. Audio Speech Lang. Process. 21(2), 397–406 (2013). https://doi.org/10.1109/TASL.2012.2215602
Article Google Scholar
Rivest, R.L., Adleman, L., Dertouzos, M.L.: On data banks and privacy homomorphisms. Found. Secure Comput. 169–179 (1978)
Google Scholar
Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T., Meignier, S.: An open-source state-of-the-art toolbox for broadcast news diarization. In: Interspeech (2013)
Google Scholar
Schuller, B., et al.: The Interspeech 2017 computational paralinguistics challenge: addressee, cold & snoring. In: Interspeech (2017)
Google Scholar
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP 2013, pp. 1631–1642 (2013)
Google Scholar
Teixeira, F., Abad, A., Trancoso, I.: Patient privacy in paralinguistic tasks. In: Interspeech (2018)
Google Scholar
Valstar, M.F., et al.: AVEC 2016 - depression, mood, and emotion recognition workshop and challenge. CoRR abs/1605.01600 (2016). http://arxiv.org/abs/1605.01600

Download references

Author information

Authors and Affiliations

INESC-ID/Instituto Superior Técnico, University of Lisbon, Lisbon, Portugal
Isabel Trancoso, Joana Correia, Francisco Teixeira & Alberto Abad
Carnegie Mellon University, Pittsburgh, USA
Joana Correia & Bhiksha Raj

Authors

Isabel Trancoso
View author publications
You can also search for this author in PubMed Google Scholar
Joana Correia
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Teixeira
View author publications
You can also search for this author in PubMed Google Scholar
Bhiksha Raj
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Abad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Isabel Trancoso .

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Aleš Horák
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Trancoso, I., Correia, J., Teixeira, F., Raj, B., Abad, A. (2018). Speech Analytics for Medical Applications. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2018. Lecture Notes in Computer Science(), vol 11107. Springer, Cham. https://doi.org/10.1007/978-3-030-00794-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-00794-2_3
Published: 08 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00793-5
Online ISBN: 978-3-030-00794-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics