Automated speech signal analysis based on feature extraction and classification of spasmodic dysphonia: a performance comparison of different classifiers

Umapathy, Snekhalatha; Rachel, Shamila; Thulasi, Rajalakshmi

doi:10.1007/s10772-017-9471-8

Automated speech signal analysis based on feature extraction and classification of spasmodic dysphonia: a performance comparison of different classifiers

Published: 31 October 2017

Volume 21, pages 9–18, (2018)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Snekhalatha Umapathy ORCID: orcid.org/0000-0002-4032-1349¹,
Shamila Rachel¹ &
Rajalakshmi Thulasi¹

368 Accesses
4 Citations
Explore all metrics

Abstract

Spasmodic Dysphonia is a voice disorder caused due to spasm of involuntary muscles in the voice box. These spasms can leads to breathy, soundless voice breaks, strangled voice by interrupting the opening of the vocal folds. There is no specific test for the diagnosis of spasmodic dysphonia. The cause of occurrence is unknown, there is no cure for the disorder, but treatments can improve the quality of voice. The main aim and objectives of the study are (i) to diagnose the dysphonia and to have comparative analysis on both continuous speech signal and sustained phonation /a/ by extracting the acoustic features. (ii) to extract the acoustic features by means of semi automated method using PRAAT software and automated method using FFT algorithm (ii) to classify the normal and spasmodic dysphonic patients using different classifiers such as Levenberg Marquardt Back propagation algorithm, K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) based on sensitivity and accuracy. Thirty normal and thirty abnormal patients were considered in the proposed study. The performance of three different classifiers was studied and it was observed that SVM and KNN were 100% accurate, whereas Levinberg BPN network produced an accuracy of about 96.7%. The voice sample of dysphonia patients showed variations from the normal speech samples. Automated analysis method was able to detect dysphonia and provides better results compared to semi automated method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Heart Sound Classification Using Deep Learning Techniques Based on Log-mel Spectrogram

Article 20 August 2022

References

Ali, Z., Alsulaiman, M., Muhammad, G., Elamvazuthi, I., & Mesallam, T. A. (2013). Vocal Fold Disorder Detection based on Continuous Speech by using MFCC and GMM. IEEE GCC Conference and Exhibition. doi:10.1109/IEEEGCC.2013.6705792.
Google Scholar
Arjmandi, M. K., Pooyan, M., Mohammadnejad, H., & Vali, M. (2010) Voice Disorders Identification Based on Different Feature Reduction Methodologies and Support Vector Machine, Proceedings of ICEE, IEEE, doi:10.1109/IRANIANCEE.2010.5507106.
Behroozmand, R., & Almasganj, F. (2007). Optimal selection of wavelet-packet-based features using genetic algorithm in pathological assessment of patients speech signal with unilateral vocal fold paralysis J. Comput. Biol.Med, 37, 474–485.
Article Google Scholar
Bhagvathi, S., & Padma, S. I. (2017). Neural network based voiced and unvoiced classification using EGG and MFCC feature. International Research Journal of Engineering and Technology, 4(4), 1934–1937.
Google Scholar
Boersm, P., & Weenink, .D.: (2003) PRAAT: doing phonetics by computer. http://www.fon.hum.uva.nl/praat.
Cohen, S. M., Kim, J., Roy, N., Asche, C., & Courey, M. (2012). Prevalence and causes of dysphonia in a large treatment –seeking population. The Laryngoscope, 122, 343–348.
Article Google Scholar
Hernandez-Espinosa, C., Gomez-Vilda, P., Godino-Llorente, J. I., & Aguilera-Navarro, S. (2000). Diagnosis of Vocal and Voice Disorders by the Speech Signal. Proceedings of IEEE-INNS-ENNS International joint conference on neural networks doi:10.1109/IJCNN.2000.860781.
Huang, H., Lee, T., Kleijn, W. B., & Kong, Y. Y. (2015). A method of speech periodicity enhancement using transform–domain signal decomposition. Speech Commununication, 67, 102–112.
Article Google Scholar
Indu, S., Singh, D., Khosla, A. (2013). QRS detection using K-Nearest Neighbor algorithm (KNN) and evaluation on standard ECG databases. Journal of Advanced Research 4(4), 331–344.
Article Google Scholar
Kayal, A. J., & Nirmal, J. (2016). Multilingual vocal emotion recognition and classification using back propagation neural network. AIP conference Proceedings 1715, 020054: doi:10.1063/1.4942736.
Khalil Arjomandi, M., & Pooyan, M. (2012). An optimum algorithm in pathological voice quality assessment using wavelet-packet-based features linear discriminant analysis and support vector machine. Biomed Signal process control, 7, 3–19.
Article Google Scholar
Khushboo Batra, Swati, & Bhasin, Amandeep Singh (2015). Acoustic analysis of voice samples to differentiate healthy and asthmatic persons. International Journal of Engineering and Computer Science, 4(7), 13161–13164.
Google Scholar
Kizi, O., & Uncuoglu, E((2005). Comparison of three back propagation training algorithm for two case studies. Indian Journal of Engineering and Material Sciences, 12, 434–442.
Google Scholar
Konadath, S., Suma, C., Jayaram, G., Sandeep, M., Mahima, G., & Shreyank, P. S. (2013). A prevalence of communication disorders in a rural population of republic of India. Journal of hearing system, 3(2), OA41-49.
Google Scholar
Lanjewar, R. B., Mathurkar, S., & Patel, N. (2015). Implementation and comparison of speech emotion recognition system using Gaussian mixture model (GMM) and K-Nearest neighbor (K-NN) techniques. Procedia Computer Science, 49, 50–57.
Article Google Scholar
Linder, R., Albers, A. E., Hess, M., Poppl, S. J., & Schonweiler, R. (2008). Artificial neural network-based classification to screen for dysphonia using psychoacoustic scaling of acoustic voice features. Journal of Voice, 22(2), 155–163.
Article Google Scholar
Majstorovic, N., Andric, M., & Mikluc, D. (2011). Entropy-based algorithm for speech recognition in noisy environment. 19th Telecommunication forum; pp. 667–670.
Massimo Buscema (1998) Back propagation neural networks. Substance Use & Misuse 33(2), 233–270.
Article Google Scholar
Mehta, D. D., & Hillman, R. E. (2012). Current role of stroboscopy in laryngeal imaging. Current Opinions in Otolaryngol Head Neck Surgery, 12(6), 429–436. 20).
Article Google Scholar
Orozco-Arrovave, J. R., Belalcazar-Balanos, E. A., Arias-Londono, J. D., Vargas-Bonilla, J. F., Skodda, S., Rusz, J., Daqrouq, K., Honig, F., & Noth, E. (2015). Characterization methods for the detection of multiple voice disorders: Neurological, functional and Laryngeal diseases. IEEE J Biomed Health Inform, 19(6), 1820–1828.
Article Google Scholar
Panek, D., Skalski, A., Gajda, J., & Tadeusiewicz, R. (2015). Acoustic analysis assessment in speech pathology detection. International Journal of Applied Maths and Computer Science, 25(3), 631–643.
Google Scholar
Rani, P., Kakkar, S., & Rani, S.(2015), Speech recognition using neural network. International journal of computer applications 11–14.
Ritchings, R. T., McGillion, M. A., & Moore, C. J. (2002). Pathological voice quality assessment using artificial neural networks. Medical Engineering Physics, 24, 561–564.
Article Google Scholar
Saidi, P., & Almasganj, F. (2015). Voice disorder signal classification using M-band wavelets and support vector machine. Circuits Syst Signal Process, 34, 2727–2738.
Article Google Scholar
Salhi, L., Mourad, T., & Cherif (2010). A Voice disorders identification using multilayer neural network. International Arab Journal of Information Technology, 7(2), 177–185.
Google Scholar
Salhi, L., Talbi, M., & Cherif, A. (2008). Gamma chirp wavelet and neural network for identification of pathological voices. Journal of Engineering and Applied Science, 3(11), 822–828.
Google Scholar
Schlotthauer, G., Torres, M. E., & Jackson-Menaldi, M. C. (2010). A pattern recognition approach to spasmodic dysphonia and muscle tension dysphonia automatic classification. Journal of voice 24(3), 346–353.
Article Google Scholar
Schuck, A., Guimaraes, L. V., & Wisbeck, J. O. (2003). Dysphonic voice classification using wavelet packet transform and artificial neural network. Proceedings of the 25th international conference of the IEEE EMBS, 2958–2961.
Sewall, G. K., Jiang, J., & Ford, C. N. (2006). Clinical evaluation of Parkinson’s -related dysphonia. The Laryngoscope, 116(10), 1740–1744.
Article Google Scholar
Shah, J. L., Smolenski, B. Y., Yantomo, R. E., & Iyer, A. (2004) Sequential K-Nearest neighbor pattern recognition for usable speech classification. Proceedings of 12th European signal processing conference pp. 741–744.
Shaw, A., Vardhan, R. K., & Saxena, S. (2016). Emotion recognition and classification in speech using Artificial neural networks. International Journal of Computer Applications, 145(8), 5–9.
Article Google Scholar
Sonkamble, B. A., Doye, D. D., & Sonkamble, S. (2009). An efficient use of support vector machines for speech signal classification,.Proc Eighth WSEAS Int Conf Computational Intelligence., Man-Machine systems and Cybernetics, pp. 117–120.
Srinivas, V., Rani, C. S., & Madhu, T. (2014). Neural network based classification for speaker identification International journal of signal processing. Image Processing and Pattern, 7(1), 109–120.
Google Scholar
Sunny, S., Peter, D., & Jacob, K. P. (2013). Performance of different classifiers in speech recognition. IJRET, 2(4), 590–597.
Article Google Scholar
Teixeira, J. P., & Gonclaves, A. (2014). Accuracy of jitter and shimmer measurements. Proceedia Technology, 16, 1190–1199.
Article Google Scholar
Teixeria, P., Fernandes, P. O. (2015). Acoustic analysis of vocal dysphonia, Proceedia Computer Science, 64, 466–473.
Article Google Scholar
Uma Rani, K., & Holi, M. S. (2014). A comparative study of neural networks and support vector machines for neurological disordered voice classification. International Journal of Engineering Research and Technology, 3(4), 652–658.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biomedical Engineering, Faculty of Engineering & Technology, SRM University, Kattankulathur, Chennai, Tamil Nadu, 603203, India
Snekhalatha Umapathy, Shamila Rachel & Rajalakshmi Thulasi

Authors

Snekhalatha Umapathy
View author publications
You can also search for this author in PubMed Google Scholar
Shamila Rachel
View author publications
You can also search for this author in PubMed Google Scholar
Rajalakshmi Thulasi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Snekhalatha Umapathy.

Additional information

Work carried out at: Department of Biomedical Engineering, Faculty of Engineering & Technology, SRM University.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Umapathy, S., Rachel, S. & Thulasi, R. Automated speech signal analysis based on feature extraction and classification of spasmodic dysphonia: a performance comparison of different classifiers. Int J Speech Technol 21, 9–18 (2018). https://doi.org/10.1007/s10772-017-9471-8

Download citation

Received: 13 July 2017
Accepted: 27 September 2017
Published: 31 October 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s10772-017-9471-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated speech signal analysis based on feature extraction and classification of spasmodic dysphonia: a performance comparison of different classifiers

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

Heart Sound Classification Using Deep Learning Techniques Based on Log-mel Spectrogram

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automated speech signal analysis based on feature extraction and classification of spasmodic dysphonia: a performance comparison of different classifiers

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

Heart Sound Classification Using Deep Learning Techniques Based on Log-mel Spectrogram

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation