Advanced computing solutions for analysis of laryngeal disorders

  • H. Irem TurkmenEmail author
  • M. Elif Karsligil
Review Article


Clinical diagnosis of voice pathologies is performed by analyzing audio, color, shape, and vibration patterns of the laryngeal recordings which are taken with medical imaging devices such as video-laryngostroboscope, direct laryngoscopy, and high-speed videoendoscopes. This paper examines state-of-the-art methods and reveals open issues and problems of computing solutions for analysis and identification of laryngeal disorders. We propose a categorical representation of the most significant applications published so far in terms of their scopes, used methodologies, and achieved results. Laryngeal image/video analysis is discussed in four main categories: segmentation of vocal folds, classification of vocal fold disorders, vocal fold vibration analysis, and vocal fold image stitching. By this study, we reveal new opportunities and potentials of vision-based computerized solutions for evaluation, early diagnosis, and prevention of laryngeal disorders.

Graphical abstract


Vocal fold disorders Laryngeal image analysis Blood vessels of vocal folds Classification of vocal fold disorders Vibration analysis 



narrow-band imaging




high-speed videoendoscopy


principal component analysis




glottal area waveform


histogram of oriented gradients






support vector machine


linear discriminant analysis


gray-level cooccurrence matrix


local binary patterns


intrapapillary capillary loops


confocal laser endomicroscopy


scale invariant and feature transform


intentional blurring pixel difference


mean structural similarity


magnetic resonance imaging


computed tomography








Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. 1.
    Pontes P, Gonçalves MI, Behlau M (1999) Vocal fold cover minor structural alterations: diagnostic errors. Phonoscope 2(4):175–185Google Scholar
  2. 2.
    Ulis JM, Yanagisawa E (2009) What’s new in differential diagnosis and treatment of hoarseness? Curr Opin Otolaryngol Head Neck Surg 17(3):209–215Google Scholar
  3. 3.
    Ziethe A, Patel R, Kunduk M, Eysholdt U, Graf S (2011) Clinical analysis methods of voice disorders. Curr Bioinforma 6(3):270–285Google Scholar
  4. 4.
    Verikas A, Uloza V, Bacauskiene M, Gelzinis A, Kelertas E (2009) Advances in laryngeal imaging. Eur Arch Otorhinolaryngol 266(10):1509–1520Google Scholar
  5. 5.
    Gaskill CS, Awan JA, Watts CR, Awan SN (2017) Acoustic and perceptual classification of within-sample normal, intermittently dysphonic, and consistently dysphonic voice types. J Voice 31(2):218–228Google Scholar
  6. 6.
    Teixeira JP, Fernandes PO (2015) Acoustic analysis of vocal dysphonia. Proc Comp Sci 64:466–473Google Scholar
  7. 7.
    Ali SM, Karule PT (2015) Design of System for Classification of Vocal Cord/Glottis Carcinoma using ANN and Support Vector Machine. Int J Comp Appl, (0975 – 8887) 132(4):1–7Google Scholar
  8. 8.
    Roy N, Barkmeier-Kraemer J, Eadie T, Sivasankar MP, Mehta D, Paul D, Hillman R (2013) Evidence-based clinical voice assessment: a systematic review. Am J Speech-Lang Pathol 22(2):212–226Google Scholar
  9. 9.
    Resteghini C, Trama A, Borgonovi E, Hosni H, Corrao G, Orlandi E et al (2018) Big Data in Head and Neck Cancer. Curr Treat Options in Oncol 19(12):62Google Scholar
  10. 10.
    Mehta DD, Hillman RE (2012) Current Role of Stroboscopy in Laryngeal Imaging. Curr Opin Otolaryngol Head Neck Surg 20(6):429–436 PMC. Web. 2017Google Scholar
  11. 11.
    Textbook of Laryngology, Nupur Kapoor Nerurkar, Jaypee Brothers Medical Publishers, 2017Google Scholar
  12. 12.
    Clemmens, C., & Piccione, J. (2016). Airway Evaluation: Bronchoscopy, Laryngoscopy, and Tracheal Aspirates. Assisted Ventilation of the Neonate: Evidence-Based Approach to Newborn Respiratory Care, 118.Google Scholar
  13. 13.
    Lukes P, Zabrodsky M, Plzak J, Chovanec M, Betka J, Foltynova E, & Betka J (2013). Narrow band imaging (NBI)—endoscopic method for detection of head and neck cancer. In Endoscopy. IntechOpen.Google Scholar
  14. 14.
    Piazza C, Del Bon F, Peretti G, Nicolai P (2012) Narrow band imaging in endoscopic evaluation of the larynx. Curr Opin Otolaryngol Head Neck Surg 20(6):472–476Google Scholar
  15. 15.
    Mascharak S, Baird BJ, Holsinger FC (2018) Detecting oropharyngeal carcinoma using multispectral, narrow-band imaging and machine learning. Laryngoscope 128(11):2514–2520Google Scholar
  16. 16.
    Assirati FS, Hashimoto CL, Dib RA, Fontes LHS, Rodriguez NT (2014) High definition endoscopy and narrow band imaging in the diagnosis of gastroesophageal reflux disease. ABCD Arq Bras Circulation Dig 27(1):59–65Google Scholar
  17. 17.
    Aghlmandi D, ve Faez K (2012) Automatic Segmentation of Glottal Space from Video Images Based on Mathematical Morphology and the hough Transform. Int J Electr Comp Eng (IJECE) 2(4):463–472Google Scholar
  18. 18.
    Irem H, Karsligil ME, Kocak I (2015) Classification of laryngeal disorders based on shape and vascular defects of vocal folds Turkmen. Comput Biol Med 62:76–85 Published 1Google Scholar
  19. 19.
    Adams R, Bischof L (1994) Seeded region growing. IEEE Trans Pattern Anal Mach Intell 16(6):641–647Google Scholar
  20. 20.
    Lohscheller J, Toy H, Rosanowski F, Eysholdt U, Ollinger MD (2007) Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos. Med Image Anal 11(4):400–413Google Scholar
  21. 21.
    Lohscheller J, Eysholdt U, Toy H et al (2008) Phonovibrography: Mapping high-speed movies of vocal fold vibrations into 2-D diagrams for visualizing and analyzing the underlying laryngeal dynamics. IEEE Trans Med Imaging 27(3):300–309 PublishedGoogle Scholar
  22. 22.
    Yan Y, Chen X, Bless D (Jul. 2006) Automatic tracing of vocal-fold motion from high-speed digital images. IEEE Trans Biomed Eng 53(7):1394–1400Google Scholar
  23. 23.
    Turkmen H, Irem H, Karsligil ME, Kocak I (2015) Classification of laryngeal disorders based on shape and vascular defects of vocal folds. Comput Biol Med 62:76–85Google Scholar
  24. 24.
    Moccia S, De Momi E, Baselli G, & Mattos LS (2015) Vocal Folds Disorders Detection and Classification in Endoscopic Narrow-Band Images.Google Scholar
  25. 25.
    Kass M, Witkin A, Terzopoulos D (1988) Snakes: Active contour models. Int J Comput Vis 1(4):321–331Google Scholar
  26. 26.
    Allin S, Galeotti J, Stetten G, & Dailey SH, (2004). “Enhanced snake based segmentation of vocal folds”, In Biomedical Imaging: Nano to Macro, 2004. IEEE International Symposium, 15-18 2004, Virginia.Google Scholar
  27. 27.
    Cerrolaza JJ, Osma V, Sáenz N, Villanueva A, Gutiérrez JM, Godino JI ve Cabeza R, (2011). “Full-automatic glottis segmentation with active shape models”, In Models and Analysis of Vocal Emissions for Biomedical Applications 7th international Workshop, 25-27 August 2011, Firenze. 35Google Scholar
  28. 28.
    Andrade-Miranda G, Saenz-Lechon N, Osma-Ruiz V, Godino-Llorente JI (2013) “A new approach for the glottis segmentation using snakes,”presented at the Int. Conf. on Bio-Inspired Systems and Signal Processing, BarcelonaGoogle Scholar
  29. 29.
    Karakozoglou SZ, Henrich N, d’Alessandro C, ve Stylianou Y (2012) Automatic glottal segmentation using local-based active contours and application to glottovibrography. Speech Comm 54(5):641–654Google Scholar
  30. 30.
    Yan Y, Du G, Zhu C, and Marriott G, “Snake based automatic tracing of vocal-fold motion from high-speed digital images,” in Proc. IEEE Int. Conf. Acoustics, Speech Signal Process., 2012, pp. 593–596.Google Scholar
  31. 31.
    Saadah AK, Galatsanos NP, Bless D, Ramos CA (1998) Deformation analysis of the vocal folds form videostroboscopic image sequences of the larynx. J Acoust Soc Am 103:3627–3639Google Scholar
  32. 32.
    Manfredi C, Bocchi L, Bianchi S et al (2006) Objective vocal fold vibration assessment from videokymographic images. Biomed Signal Proc Control 1(2):129–136 PublishedGoogle Scholar
  33. 33.
    Osma-Ruiz V, Godino-Llorente JI, Sáenz-Lechón N, Fraile R (2008) Segmentation of the glottal space from laryngeal images using the watershed transform. Comput Med Imag Graphics 32(3):193–201Google Scholar
  34. 34.
    Andrade-Miranda G, Godino-Llorente JI, Moro-Velázquez L, Gómez-García JA (2015) An automatic method to detect and track the glottal gap from high speed videoendoscopic images. Biomed Eng Online 14(1):100Google Scholar
  35. 35.
    Mendez A, Garcia B, Ruiz I ve Iturricha I, (2008). “Glottal Area Segmentation without Initialization using Gabor Filters”, In Signal Processing and Information Technology, IEEE International Symposium on, 16-19 Dec. 2008, Sarajevo.Google Scholar
  36. 36.
    Palm C, Lehmann TM, Bredno J, Neuschaefer-Rube C, Klajman S, and Spitzer K, (2001)“Automated analysis of stroboscopic image sequences by vibration profiles,” Proc. 5th Int. Workshop Advances Quantitative Laryngol., Voice Speech Res., Groningen, Netherlands.Google Scholar
  37. 37.
    Gloger O, Lehnert B, Schrade A et al (2015) Fully Automated Glottis Segmentation in Endoscopic Videos Using Local Color and Shape Features of Glottal Regions. IEEE Trans Biomed Eng 62(3):795–806 PublishedGoogle Scholar
  38. 38.
    Kuo CF, Wang HW, Hsiao SW et al (2014) Development of laryngeal video stroboscope with laser marking module for dynamic glottis measurement. Comput Med Imaging Graph 38(1):34–41 PublishedGoogle Scholar
  39. 39.
    Zhang Y, Bieging E, Tsui H, Jiang JJ (2010) Efficient and effective extraction of vocal fold vibratory patterns from high-speed digital imaging. J Voice 24(1):21–29Google Scholar
  40. 40.
    Turkmen HI, Albayrak A, Karsligil ME, Kocak I (2017) Superpixel-based segmentation of glottal area from videolaryngoscopy images. Journal of Electronic Imaging 26(6):061608Google Scholar
  41. 41.
    Laves MH, Bicker J, Kahrs LA, Ortmaier T (2019) A dataset of laryngeal endoscopic images with comparative study on convolution neural network-based semantic segmentation. Int J Comput Assist Radiol Surg 14(3):483–492Google Scholar
  42. 42.
    Ahmad K, Yan Y, Bless D (2012) Vocal fold vibratory characteristics of healthy geriatric females—analysis of high-speed digital images. J Voice 26(6):751–759Google Scholar
  43. 43.
    Manfredi C, Bocchi L, Cantarella G et al (2012) Videokymographic image processing: Objective parameters and user-friendly interface. Biomedical Signal Processing and Control 7(2):192–201 PublishedGoogle Scholar
  44. 44.
    Shi T, Kim HJ, Murry T et al (2015) Tracing vocal fold vibrations using level set segmentation method. Int J Numer Methods Biomed Eng 31(6):e02715. PublishedGoogle Scholar
  45. 45.
    Chen G, Kreiman J, Alwan A (2014) The glottaltopogram: A method of analyzing high-speed images of the vocal folds. Comput Speech Lang 28(5 Special Issue:SI):1156–1169 PublishedGoogle Scholar
  46. 46.
    Warhurst S, McCabe P, Heard R, Yiu E, Wang G, Madill C. (2014) Quantitative Measurement of Vocal Fold Vibration in Male Radio Performers and Healthy Controls Using High-Speed Videoendoscopy. PublishedGoogle Scholar
  47. 47.
    Yiu EML, Kong J, Fong R, Chan KMK (2010) A preliminary study of a quantitative analysis method for high speed laryngoscopic images. Int J Speech Lang Pathol 12:1–10Google Scholar
  48. 48.
    Herbst CT, Unger J, Herzel H, Švec JG, Lohscheller J (2016) Phasegram analysis of vocal fold vibration documented with laryngeal high-speed video endoscopy. J Voice 30(6):771–7e1Google Scholar
  49. 49.
    De Biase NG, de Lima Pontes PA (2008) Blood vessels of vocal folds: a videolaryngoscopic study. Arch Otolaryngol–Head Neck Surg 134(7):720–724Google Scholar
  50. 50.
    Lin S-F, Chen H-T, Tsai T-L (2012) Automatic Identifying Laryngopharyngeal Reflux Using Artificial Neural Network. Biomed Eng-Appl Basis Commun 24(1):47–56 PublishedGoogle Scholar
  51. 51.
    Verikas A, Gelzinis A, Bacauskiene M et al (2006) Towards a computer-aided diagnosis system for vocal cord diseases. Artif Intell Med 36(1):71–84 Published.Google Scholar
  52. 52.
    Verikas A, Gelzinis A, Bacauskiene M et al (2005) Intelligent vocal cord image analysis for categorizing laryngeal diseases. 18th International Industrial and Engineering Applications of Artificial Intelligence and Expert Systems Location, BariGoogle Scholar
  53. 53.
    Verikas A, Gelzinis A, Bacauskiene M, Uloza V (2006) Integrating global and local analysis of color, texture and geometrical information for categorizing laryngeal images. Int J Pattern Recognit Artif Intell 20(08):1187–1205Google Scholar
  54. 54.
    Verikas A, Gelzinis A, Valincius D et al (2007) Multiple feature sets based categorization of laryngeal images. Comp Methods Programs in Biomed 85(3):257–266 PublishedGoogle Scholar
  55. 55.
    Verikas A, Gelzinis A, Bacauskiene M et al (2010) Combining image, voice, and the patient's questionnaire data to categorize laryngeal disorders. Artif Intell Med 49(1):43–50 PublishedGoogle Scholar
  56. 56.
    Bacauskiene M, Verikas A, Gelzinis A et al (2009) A feature selection technique for generation of classification committees and its application to categorization of laryngeal images. Pattern Recogn 42(5):645–654 PublishedGoogle Scholar
  57. 57.
    Kuo C-FJ, Wang P-C, Chu Y-H et al (2013) Using image processing technology combined with decision tree algorithm in laryngeal video stroboscope automatic identification of common vocal fold diseases. Comput Methods Prog Biomed 112(1):228–236 PublishedGoogle Scholar
  58. 58.
    Turkmen H, Karsligil ME ve Kocak I, (2013). “Classification Of Vocal Fold Nodules And Cysts Based On Vascular Defects Of Vocal Folds”, In IEEE International Workshop on Machine Learning for Signal Processing, 22-25 September 2013, Southampton.Google Scholar
  59. 59.
    Barbalata C, Mattos LS (2016) Laryngeal tumor detection and classification in endoscopic video. IEEE J Biomed Health Inform 20(1):322–332Google Scholar
  60. 60.
    Moccia S et al (2017) Confident texture-based laryngeal tissue classification for early stage diagnosis support. J Med Imaging 4(3):034502Google Scholar
  61. 61.
    Moccia S, De Momi E, Mattos LS (2017) Laryngeal dataset [Data set]. Zenodo.
  62. 62.
    Nanni L, Ghidoni S, & Brahnam S (2018). Ensemble of convolutional neural networks for bioimage classification. Appl Comp Inform.Google Scholar
  63. 63.
    Huang CC, Leu YS, Kuo CFJ et al (2014) Automatic recognizing of vocal fold disorders from glottis images Huang. Proc Instit Mech Eng Part H-J Eng Med 228(9):952–961 PublishedGoogle Scholar
  64. 64.
    Dittberner A, Rodner E, Ortmann W, Stadler J, Schmidt C, Petersen I et al (2016) Automated analysis of confocal laser endomicroscopy images to detect head and neck cancer. Head Neck 38(S1):E1419–E1426Google Scholar
  65. 65.
    Unger J, Hecker DJ, Kunduk M et al (2014) Quantifying Spatiotemporal Properties of Vocal Fold Dynamics Based on a Multiscale Analysis of Phonovibrograms. IEEE Trans Biomed Eng 61(9):2422–2433 PublishedGoogle Scholar
  66. 66.
    Inwald EC, Doellinger M, Schuster M, Eysholdt U, Bohr C (2011) Multiparametric analysis of vocal fold vibrations in healthy and disordered voices in high-speed imaging. J Voice 25(5):576–590Google Scholar
  67. 67.
    Voigt D, Döllinger M, Braunschweig T et al (2010) Classification of functional voice disorders based on phonovibrograms. Artif Intell Med 49(1):51–59 PublishedGoogle Scholar
  68. 68.
    Voigt D, Döllinger M, Yang A et al (2010) Automatic diagnosis of vocal fold paresis by employing phonovibrogram features and machine learning methods. Comput Methods Prog Biomed 99(3):275–288. Published.Google Scholar
  69. 69.
    Unger J, Lohscheller J, Reiter M, Eder K, Betz CS, Schuster M (2015) A noninvasive procedure for early-stage discrimination of malignant and precancerous vocal fold lesions based on laryngeal dynamics analysis. Cancer Res 75(1):31–39Google Scholar
  70. 70.
    Zorrilla AM, & Zapirain BG (2013). Vocal Folds Stroboscopic Image Processing for Otolaryngology. INTECH Open Access Publisher.Google Scholar
  71. 71.
    Mendez-Zorrilla A, Garcia-Zapirain B (2015) Vocal folds morphological pathologies detection using Gabor filtering and Principal Component Analysis. Technol Health Care 23(5):591–604. Google Scholar
  72. 72.
    Schuster M, Bergen T, Reiter M, Münzenmayer C, Friedl S, Wittenberg T (2012) Laryngoscopic image stitching for view enhancement and documentation–first experiences. Biomed Eng/Biomedizinische Technik 57(SI-1 Track-H):704–707Google Scholar
  73. 73.
    Moccia S, Penza V, Vanone GO, De Momi E, & Mattos LS (2016). Automatic workflow for narrow-band laryngeal video stitching. In 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 1188-1191). IEEE.Google Scholar
  74. 74.
    Moccia S, Vanone GO, De Momi E, Laborai A, Guastini L, Peretti G, Mattos LS (2018) Learning-based classification of informative laryngoscopic frames. Comput Methods Prog Biomed 158:21–30Google Scholar
  75. 75.
    Le Borgne W. (2006) Laryngeal Videostroboscopic Images: Normal and Pathologic Samples DVD-ROM – AudiobookGoogle Scholar
  76. 76.
    Sun C, Han X, Li X, Zhang Y, Du X (2017) Diagnostic Performance of Narrow Band Imaging for Laryngeal Cancer: A Systematic Review and Meta-analysis. Otolaryngol Head Neck Surg 156(4):589–597Google Scholar
  77. 77.
    Klimza H et al (2017) Narrow-band imaging (NBI) for improving the assessment of vocal fold leukoplakia and overcoming the umbrella effect. PLoS One 12(6):e0180590Google Scholar
  78. 78.
    Baki MM, Menys A, Atkinson D, Bassett P, Morley S, Beale T et al (2017) Feasibility of vocal fold abduction and adduction assessment using cine-MRI. Eur Radiol 27(2):598–606Google Scholar
  79. 79.
    Paquette CM, Manos DC, Psooy BJ (2012) Unilateral vocal cord paralysis: a review of CT findings, mediastinal causes, and the course of the recurrent laryngeal nerves. Radiographics 32(3):721–740Google Scholar
  80. 80.
    Bresch E, Narayanan S (2009) Region segmentation in the frequency domain applied to upper airway real-time magnetic resonance images. IEEE Trans Med Imaging 28(3):323–338Google Scholar
  81. 81.
    Herbst CT (2019). Electroglottography–An Update. J Voice.Google Scholar
  82. 82.
    Frauenrath T, Kob M (2007) A System for parallel Measurement of Glottis Opening and Larynx Position. Models and analysis of vocal emissions for biomedical applications. Biomed Signal Proc Control 4(3):221–228 1000-1003Google Scholar
  83. 83.
    Tronchin L, Kob M, Guarnaccia C (2018) Spatial information on voice generation from a multi-channel electroglottograph. Appl Sci 8(9):1560Google Scholar
  84. 84.
    Cooper, T., Dziegielewski, P. T., Singh, P., & Seemann, R. (2015). Acromegaly presenting with bilateral vocal fold immobility: case report and review. J VoiceGoogle Scholar
  85. 85.
    Deniwar A, Kandil E, Randolph G (2015) Electrophysiological neural monitoring of the laryngeal nerves in thyroid surgery: review of the current literature. Gland Surg 4(5):368Google Scholar
  86. 86.
    Puram, S. V., Chow, H., Wu, C. W., Heaton, J. T., Kamani, D., Gorti, G., ..., & Dralle, H. (2016). Vocal cord paralysis predicted by neural monitoring electrophysiologic changes with recurrent laryngeal nerve compressive neuropraxic injury in a canine model. Head Neck, 38(S1).Google Scholar
  87. 87.
    Deshpande N, Peretti G, Mora F, Guastini L, Lee J, Barresi G et al (2018) Design and Study of a Next-Generation Computer-Assisted System for Transoral Laser Microsurgery. OTO Open 2(2):2473974X18773327Google Scholar
  88. 88.
    Arens C, Piazza C, Andrea M, Dikkers FG, Gi RETP, Voigt-Zimmermann S, Peretti G (2016) Proposal for a descriptive guideline of vascular changes in lesions of the vocal folds by the committee on endoscopic laryngeal imaging of the European Laryngological Society. Eur Arch Otorhinolaryngol 273(5):1207–1214Google Scholar
  89. 89.
    Lau K, Wilkinson J, Moorthy R (2018) A web-based prediction score for head and neck cancer referrals. Clin Otolaryngol 43(4):1043–1049Google Scholar
  90. 90.
    Yamauchi A, Yokonishi H, Imagawa H, Sakakibara KI, Nito T, Tayama N, Yamasoba T (2016) Quantification of vocal fold vibration in various laryngeal disorders using high-speed digital imaging. J Voice 30(2):205–214Google Scholar
  91. 91.
    Yamauchi A, Yokonishi H, Imagawa H et al (2015) Quantitative analysis of digital videokymography: a preliminary study on age- and gender-related difference of vocal fold vibration in normal speakers. J Voice 29:109–119Google Scholar
  92. 92.
    Yamauchi A, Imagawa H, Sakakibara K-I et al (2014) Characteristics of vocal fold vibrations in vocally healthy subjects: analysis with multi-line kymography. J Speech Lang Hear Res 57:648–657Google Scholar
  93. 93.
    Tang SS, Thibeault SL (2017) Timing of voice therapy: a primary investigation of voice outcomes for surgical benign vocal fold lesion patients. J Voice 31(1):129–1e1Google Scholar

Copyright information

© International Federation for Medical and Biological Engineering 2019

Authors and Affiliations

  1. 1.Computer Engineering Department, Faculty of Electrical & Electronics EngineeringYildiz Technical UniversityIstanbulTurkey

Personalised recommendations