Skip to main content

An Ensemble Approach for the Diagnosis of COVID-19 from Speech and Cough Sounds

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12997))

Included in the following conference series:

Abstract

Mass COVID-19 infections detection has shown to be a very hard problem. In this work, we describe our systems developed to diagnose COVID-19 cases based on coughing sounds and speech. We propose a hybrid configuration that employs Convolution Neural Network (CNN), Time Delay Neural Network (TDNN) and Long Short-Term Memory (LSTM) for the extraction of coughing sound and speech embeddings. Moreover, the proposed framework utilizes SpecAugment-based on-the-fly data augmentation and multi-level statistics pooling for mapping frame level information into utterance level embedding. We employ classical support vector machines, random forests, AdaBoost, decision trees, and logistic regression classifiers for the final decision making, to determine whether the given feature is from a COVID-19 negative or positive patient. We also adopt an end-to-end approach employing ResNet model with a one-class softmax loss function for making positive versus negative decision over the high resolution hand-crafted features. Experiments are carried out on the two subsets, denoted as COVID-19 Speech Sounds (CSS) and COVID-19 Cough Sounds (CCS), from the Cambridge COVID-19 Sound database and experimental results are reported on the development and test sets of these subsets. Our approach outperforms the baselines provided by the challenge organizers on the development set, and shows that using speech to help remotely detect early COVID-19 infections and eventually other respiratory diseases is likely possible, which opens a new opportunity for a promising cheap and scalable pre-diagnosis way to better handle pandemics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/auDeep/auDeep.

  2. 2.

    https://github.com/DeepSpectrum/DeepSpectrum.

References

  1. Amiriparian, S., Freitag, M., Cummins, N., Schuller, B.: Sequence to sequence autoencoders for unsupervised representation learning from audio. Universität Augsburg (2017)

    Google Scholar 

  2. Amiriparian, S., et al.: Snore sound classification using image-based deep spectrum features (2017)

    Google Scholar 

  3. Amiriparian, S., Gerczuk, M., Ottl, S., Cummins, N., Pugachevskiy, S., Schuller, B.: Bag-of-Deep-Features: noise-robust deep feature representations for audio analysis. In: Proceeding of IJCNN, pp. 1–7. IEEE (2018)

    Google Scholar 

  4. Brown, C., et al.: Exploring automatic diagnosis of COVID-19 from crowdsourced respiratory sound data. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3474–3484 (2020)

    Google Scholar 

  5. Deshmukh, S., Ismail, M.A., Singh, R.: Interpreting glottal flow dynamics for detecting COVID-19 from voice. arXiv preprint arXiv:2010.16318 (2020)

  6. Deshpande, G., Schuller, B.: An overview on audio, signal, speech, & language processing for COVID-19. arXiv preprint arXiv:2005.08579 (2020)

  7. Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of ACM ICM, pp. 835–838 (2013)

    Google Scholar 

  8. Freitag, M., Amiriparian, S., Pugachevskiy, S., Cummins, N., Schuller, B.: auDeep: unsupervised learning of representations from audio with deep recurrent neural networks. J. Mach. Learn. Res. 18(1), 6340–6344 (2017)

    MathSciNet  Google Scholar 

  9. Han, J., et al.: Exploring automatic COVID-19 diagnosis via voice and symptoms from crowdsourced data, pp. 8328–8332 (2021)

    Google Scholar 

  10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). http://arxiv.org/abs/1512.03385

  11. Imran, A., et al.: AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app. Inf. Med. Unlocked 20, 100378 (2020)

    Google Scholar 

  12. Ismail, M.A., Deshmukh, S., Singh, R.: Detection of COVID-19 through the analysis of vocal fold oscillations. arXiv preprint arXiv:2010.10707 (2020)

  13. Khalid, H., Woo, S.S.: OC-FakeDect: classifying deepfakes using one-class variational autoencoder. In: Proceedings of IEEE/CVF CVPR Workshops, pp. 2794–2803 (2020)

    Google Scholar 

  14. Laguarta, J., Hueto, F., Subirana, B.: COVID-19 artificial intelligence diagnosis using only cough recordings. IEEE Open J. Eng. Med. Biol. 1, 275–281 (2020)

    Article  Google Scholar 

  15. Latif, S., Rana, R., Khalifa, S., Jurdak, R., Qadir, J., Schuller, B.W.: Deep representation learning in speech processing: Challenges, recent advances, and future trends. CoRR abs/2001.00378 (2020). http://arxiv.org/abs/2001.00378

  16. Muguli, A., et al.: Dicova challenge: dataset, task, and baseline system for COVID-19 diagnosis using acoustics (2021)

    Google Scholar 

  17. Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. CoRR abs/1706.08612 (2017). http://arxiv.org/abs/1706.08612

  18. Park, D.S., et al.: SpecAugment: a simple data augmentation method for automatic speech recognition. Proc. Interspeech 2019, 2613–2617 (2019)

    Google Scholar 

  19. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  20. Povey, D., et al.: The Kaldi speech recognition toolkit (2011)

    Google Scholar 

  21. Qian, K., Schuller, B.W., Yamamoto, Y.: Recent advances in computer audition for diagnosing covid-19: An overview. arXiv preprint arXiv:2012.04650 (2020)

  22. Quatieri, T.F., Talkar, T., Palmer, J.S.: A framework for biomarkers of COVID-19 based on coordination of speech-production subsystems. IEEE Open J. Eng. Med. Biol. 1, 203–206 (2020)

    Article  Google Scholar 

  23. Schmitt, M., Schuller, B.: OpenXBOW: introducing the Passau open-source crossmodal bag-of-words toolkit (2017)

    Google Scholar 

  24. Schuller, B., et al.: The interspeech 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of INTERSPEECH (2013)

    Google Scholar 

  25. Schuller, B.W., et al.: The interspeech 2021 computational paralinguistics challenge: COVID-19 cough, COVID-19 speech, escalation & primates. arXiv preprint arXiv:2102.13468 (2021)

  26. Stappen, L., Rizos, G., Hasan, M., Hain, T., Schuller, B.W.: Uncertainty-aware machine support for paper reviewing on the interspeech 2019 submission corpus. In: Proc. Interspeech, pp. 1808–1812 (2020)

    Google Scholar 

  27. Tang, Y., Ding, G., Huang, J., He, X., Zhou, B.: Deep speaker embedding learning with multi-level pooling for text-independent speaker verification. In: Proceedings of IEEE ICASSP, pp. 6116–6120 (2019)

    Google Scholar 

  28. Weninger, F., Eyben, F., Schuller, B.W., Mortillaro, M., Scherer, K.R.: On the acoustics of emotion in audio: what speech, music, and sound have in common. Front. Psychol. 4, 292 (2013)

    Article  Google Scholar 

  29. Zhang, Y., Jiang, F., Duan, Z.: One-class learning towards synthetic voice spoofing detection. IEEE Sig. Process. Lett. 28, 937–941 (2021)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jahangir Alam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fathan, A., Alam, J., Kang, W.H. (2021). An Ensemble Approach for the Diagnosis of COVID-19 from Speech and Cough Sounds. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87802-3_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87801-6

  • Online ISBN: 978-3-030-87802-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics