Automatic no-reference speech quality assessment with convolutional neural networks

Albuquerque, Renato Q.; Mello, Carlos A. B.

doi:10.1007/s00521-021-05767-4

Automatic no-reference speech quality assessment with convolutional neural networks

Original Article
Published: 09 February 2021

Volume 33, pages 9993–10003, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

273 Accesses
5 Citations
Explore all metrics

Abstract

In this paper, it is presented a convolutional neural network model to address the automatic speech quality assessment problem. It is a no-reference methodology that applies convolutional layers as feature extractors for visual representation through Mel-Frequency Cepstral Coefficients of the speech signal. Its performance is evaluated through comparison to the methodologies PESQ, ViSQOL and P.563. The experiments were conducted in publicly available databases and in another database that was built to evaluate our model in the context of background noise. The results are analyzed by means of correlation measures and statistical descriptions. Through four experiments, we have concluded that: (1) our model achieved high overall generalization, even when it was trained with a limited quantity of samples; (2) it also characterized speech and background sound even for databases where complex degradation is present; and (3) the proposed model tends to assign high scores to clean speech and low scores to samples with just noise, right as expected.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Automatic speech recognition: a survey

Article 10 November 2020

References

Uddin Z, Nilsson EG (2020) Emotion recognition using speech and neural structured learning to facilitate edge intelligence. Eng Appl Artif Intell 94:2–11
Article Google Scholar
Grozdic DT, Jovicic ST, Subotic M (2017) Whispered speech recognition using deep denoising autoencoder. Eng Appl Artif Intell 59:15–22
Article Google Scholar
Orozco-Arroyave J et al (2018) Neurospeech: an open-source software for parkinson’s speech analysis. Digit Signal Process 77:207–221
Article Google Scholar
Braga D, Madureira A, Coelho L, Ajith R (2019) Automatic detection of Parkinson’s disease based on acoustic analysis of speech. Eng Appl Artif Intell 77:148–158
Article Google Scholar
Furundzic D, Stankovic S, Jovicic S, Punisic S, Subotic M (2017) Distance based resampling of imbalanced classes: with an application example of speech quality assessment. Eng Appl Artif Intell 64:440–461
Article Google Scholar
Almeida FL, Rosa RL, Rodriguez DZ (2018) Voice quality assessment in communication services using deep learning. International Symposium on Wireless Communication Systems, 1–6.
Soni MH, Patil HA (2016) Novel deep autoencoder features for non-intrusive speech quality assessment. European Signal Processing Conference (EUSIPCO), 2315–2319.
Allonso E, Rosa R, Rodriguez DZ (2017) Speech quality assessment over lossy transmission channels using deep belief networks. IEEE Signal Process Lett 25(1):1–1
Google Scholar
Fu SW, Tsao Y, Hwang HT, Wang HM (2018) Quality-net: An end-to-end non-intrusive speech quality assessment model based on BLSTM. Interspeech, 1873-1877
Avila AR, Gamper H, Reddy C, Cutler R, Tashev I, Gehrke J (2019) Non-intrusive speech quality assessment using neural networks. IEEE International Conference on Acoustics, Speech and Signal Processing, 631–635.
Lo CC, Fu SW, Huang WC, X. Wang, Yamagishi J, Tsao Y, Wang HM (2019) Mosnet: Deep learning-based objective assessment for voice conversion. Interspeech.
ITU-T (2001) Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs, Recommendation P.862.
ITU-T (1998) Objective quality measurement of telephone-band (300–3400 HZ) speech codecs, Recommendation P.861.
ITU-T (2017) Wideband extension to recommendation p.862 for the assessment of wideband telephone networks and speech codecs, Recommendation P.862.2.
ITU-T (2011) Perceptual objective listening quality assessment, Recommendation P.863.
ITU-T (2013) Perceptual objective listening quality prediction, Recommendation P.863.
Toral-Cruz H, Argaez-Xool J, Estrada-Vargas L, Torres-Roman D (2011) An introduction to voip: End-to-end elements and QOS parameters. In-Tech.
Hines A, Skoglund J, Kokaram A, Harte N (2015) VISQOL: An objective speech quality model. EURASIP J Audio Speech Music Process 13:1–18
Google Scholar
ITU-T (2004) Single ended method for objective speech quality assessment in narrowband telephony applications, Recommendation P.563.
Kim DS (2005) Anique: an auditory model for single-ended speech quality estimation. IEEE Trans Speech Audio Process 13:821–831
Article Google Scholar
Abdel-Hamid O, Mohamed AR, Jiang H, Deng L, Penn G, Yu D (2014) Convolutional neural networks for speech recognition. IEEE/ACM Trans Audio Speech Lang Process 22:1533–1545
Article Google Scholar
Park S, Lee J (2017) A fully convolutional neural network for speech enhancement. Interspeech, 1993–1997.
Andersen A, Haan J, Tan ZH, Jensen J (2018) Non-intrusive speech intelligibility prediction using convolutional neural networks. IEEE/ACM Trans Audio Speech Lang Process 26(10):1925–1939
Article Google Scholar
Voice conversion challenge [homepage on the Internet]. Available from: http://www.vc-challenge.org/
ITU-T, P.sup23: ITU-T coded-speech database, Recommendation P.Sup23.
Mcloughlin I (2009) Applied speech and audio processing with matlab examples. Cambridge University Press, Cambridge
Book Google Scholar
Dubey RK, Kumar A (2013) Non-intrusive objective speech quality assessment using a combination of MFCC, PLP and LSF features. International Conference on Signal Processing and Communication, 297–302.
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. International Conference on Machine Learning, 448–456.
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
MathSciNet MATH Google Scholar
Harte N, Gillen E, Hines A (2015) TCD-VOIP: a research database of degraded speech for assessing quality in VOIP applications. International Workshop on Quality of Multimedia Experience.
ITU-T, Application guide for objective quality measurement based on recommendations p.862, p.862.1 and p.862.2, Recommendation P.862.3.
Barras B. Sox: Sound exchange [homepage on the Internet]. Available from: http://sox.sourceforge.net/
Upadhyay N, Karmakar A (2015) Speech enhancement using spectral subtraction-type algorithms: a comparison and simulation study. Procedia Comput Sci 54:574–584
Article Google Scholar
Hirsch HG, Fant - Filtering and noise adding tool [homepage on the Internet]. Available from: https://github.com/i3thuan5/FaNT
Hu Y, Loizou P (2008) Evaluation of objective quality measures for speech enhancement. IEEE Trans Audio Speech Lang Process 16:229–238
Article Google Scholar
ETSI, Speech and multimedia transmission quality (SQT); speech quality performance in the presence of background noise; part 3: Background noise transmission - objective test methods, ETSI EG 202 396–3.
Beerends J et al (2020) Subjective and Objective Assessment of Full Bandwidth Speech Quality. IEEE Trans Audio Speech Lang Process 28:440–449
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the support of NVIDIA Corporation with the donation of the Titan XP GPU used for this research.

Author information

Authors and Affiliations

Centro de Informática, Universidade Federal de Pernambuco, Recife, Brazil
Renato Q. Albuquerque & Carlos A. B. Mello

Authors

Renato Q. Albuquerque
View author publications
You can also search for this author in PubMed Google Scholar
Carlos A. B. Mello
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos A. B. Mello.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Albuquerque, R.Q., Mello, C.A.B. Automatic no-reference speech quality assessment with convolutional neural networks. Neural Comput & Applic 33, 9993–10003 (2021). https://doi.org/10.1007/s00521-021-05767-4

Download citation

Received: 14 October 2020
Accepted: 21 January 2021
Published: 09 February 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s00521-021-05767-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic no-reference speech quality assessment with convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

A review of convolutional neural networks in computer vision

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic no-reference speech quality assessment with convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

A review of convolutional neural networks in computer vision

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation