Application of non-negative frequency-weighted energy operator for vowel region detection

Thirumuru, Ramakrishna; Vuppala, Anil Kumar

doi:10.1007/s10772-018-9505-x

Application of non-negative frequency-weighted energy operator for vowel region detection

Published: 10 April 2018

Volume 21, pages 279–291, (2018)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Ramakrishna Thirumuru¹ &
Anil Kumar Vuppala¹

237 Accesses
4 Citations
Explore all metrics

Abstract

In this paper, a novel technique has been proposed for the vowel region detection from the continuous speech using an envelope of the derivative of the speech signal, which is a non-negative, frequency-weighted energy operator. The proposed vowel region detection method is implemented using a two-stage algorithm. The first stage of vowel region detection consists of speech signal analysis to detect vowel onset points (VOP) and vowel end-points (VEP) using an instantaneous energy contour obtained from the envelope of the derivative of a speech signal. The VOPs and VEPs are spotted using the peak-finding algorithm based upon the first order Gaussian differentiator. The next stage consists of removal of spurious vowel regions and the correction of hypothesized VOP and VEP locations using combined cues obtained from the uniformity of epoch intervals and strength of the excitation of the speech signal. Performance of the proposed method for detecting vowel regions from the speech signal is evaluated using TIMIT acoustic-phonetic speech corpus. The proposed approach resulted in significantly high detection rate and less false alarm rate compared to the state-of-the-art methods in both clean and noisy environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Milestones in speaker recognition

Article Open access 15 February 2024

Chinese dialect speech recognition: a comprehensive survey

Article Open access 31 January 2024

References

Ananthapadmanabha, T., & Yegnanarayana, B. (1979). Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(4), 309–319.
Article Google Scholar
Deller, J. R, Jr., Proakis, J. G., & Hansen, J. H. (1993). Discrete time processing of speech signals. Englewood Cliffs: Prentice Hall PTR.
Google Scholar
Donaldson, G. S., Rogers, C. L., Cardenas, E. S., Russell, B. A., & Hanna, N. H. (2013). Vowel identification by cochlear implant users: Contributions of static and dynamic spectral cues. The Journal of the Acoustical Society of America, 134(4), 3021–3028.
Article Google Scholar
Dumpala, S. H., Nellore, B. T., Nevali, R. R., Gangashetty, S. V., & Yegnanarayana, B. (2016). Robust vowel landmark detection using epoch-based features. In INTERSPEECH (pp. 160–164).
Fant, G. (1971). Acoustic theory of speech production: With calculations based on X-ray studies of Russian articulations. Berlin: Walter de Gruyter.
Book Google Scholar
Gangamohan, P., Kadiri, S. R., Gangashetty, S. V., & Yegnanarayana, B. (2014). Excitation source features for discrimination of anger and happy emotions. In Fifteenth annual conference of the International Speech Communication Association.
Glass, J. R. (2003). A probabilistic framework for segment-based speech recognition. Computer Speech & Language, 17(2), 137–152.
Article Google Scholar
Hansen, J. H., Gray, S. S., & Kim, W. (2010). Automatic voice onset time detection for unvoiced stops (/p/,/t/,/k/) with application to accent classification. Speech Communication, 52(10), 777–789.
Article Google Scholar
Hermes, D. J. (1990). Vowel-onset detection. The Journal of the Acoustical Society of America, 87(2), 866–873.
Article Google Scholar
Johnson, K. (2004). Acoustic and auditory phonetics. Phonetica, 61(1), 56–58.
Article Google Scholar
Juneja, A., & Espy-Wilson, C. (2008). A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition. The Journal of the Acoustical Society of America, 123(2), 1154–1168.
Article Google Scholar
Kaiser, J. F. (1990). On a simple algorithm to calculate the ’energy’ of a signal. In Proceedings of the 1990 international conference on acoustics, speech, and signal processing (ICASSP-90), pp. 381–384.
Kaiser, J. F. (1993). Some useful properties of Teager’s energy operators. In Proceedings of the 18th IEEE international conference on acoustics, speech, and signal processing (ICASSP '93), vol. 3, pp. 149–152.
Kashani, H. B., Sayadiyan, A., & Sheikhzadeh, H. (2017). Vowel detection using a perceptually-enhanced spectrum matching conditioned to phonetic context and speaker identity. Speech Communication, 91, 28–48.
Article Google Scholar
Kumar, A., Shahnawazuddin, S., & Pradhan, G. (2017). Improvements in the detection of vowel onset and offset points in a speech sequence. Circuits, Systems, and Signal Processing, 36(6), 2315–2340.
Article MathSciNet Google Scholar
Liu, S. A. (1996). Landmark detection for distinctive feature-based speech recognition. The Journal of the Acoustical Society of America, 100(5), 3417–3430.
Article Google Scholar
Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561–580.
Article Google Scholar
Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613.
Article Google Scholar
O’Toole, J. M., Temko, A., & Stevenson, N. (2014). Assessing instantaneous energy in the EEG: A non-negative, frequency-weighted energy operator. In Engineering in Medicine and Biology Society (EMBC), 2014 36th annual international conference of the IEEE, pp. 3288–3291.
Palmu, K., Stevenson, N., Wikström, S., Hellström-Westas, L., Vanhatalo, S., & Palva, J. M. (2010). Optimization of an nleo-based algorithm for automated detection of spontaneous activity transients in early preterm EEG. Physiological Measurement, 31(11), N85.
Article Google Scholar
Pradhan, G., & Prasanna, S. M. (2013). Speaker verification by vowel and nonvowel like segmentation. IEEE Transactions on Audio, Speech, and Language Processing, 21(4), 854–867.
Article Google Scholar
Prasanna, S. M. & Yegnanarayana, B. (2005). Detection of vowel onset point events using excitation information. In Ninth European conference on speech communication and technology.
Prasanna, S. M., & Pradhan, G. (2011). Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2552–2565.
Article Google Scholar
Prasanna, S. M., Reddy, B. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 556–565.
Article Google Scholar
Rao, K. S., & Yegnanarayana, B. (2009). Duration modification using glottal closure instants and vowel onset points. Speech Communication, 51(12), 1263–1269.
Article Google Scholar
Rose, P. (2003). Forensic speaker identification. Boca Raton: CRC Press.
Google Scholar
Saha, P., Laskar, R. H., & Laskar, A. (2016). A pre-processing method for improvement of vowel onset point detection under noisy conditions. Speech Communication, 80, 71–83.
Article Google Scholar
Salomon, A., Espy-Wilson, C. Y., & Deshmukh, O. (2004). Detection of speech landmarks: Use of temporal information. The Journal of the Acoustical Society of America, 115(3), 1296–1305.
Article Google Scholar
Schutte, K., & Glass, J., (2005). Robust detection of sonorant landmarks. In Ninth European conference on speech communication and technology.
Stevens, K. N. (2000). Acoustic phonetics. Cambridge: MIT Press.
Google Scholar
Teager, H., & Teager, S. (1990). Evidence for nonlinear sound production mechanisms in the vocal tract. Speech Production and Speech Modelling, 55, 241–261.
Article Google Scholar
Vuppala, A. K., & Rao, K. S. (2013). Vowel onset point detection for noisy speech using spectral energy at formant frequencies. International Journal of Speech Technology, 16(2), 229–235.
Article Google Scholar
Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2012). Improved vowel onset point detection using epoch intervals. AEU-International Journal of Electronics and Communications, 66(8), 697–700.
Article Google Scholar
Vuppala, A. K., Yadav, J., Chakrabarti, S., & Rao, K. S. (2012). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1894–1903.
Article Google Scholar
Vydana, H. K., Vikash, P., Vamsi, T., Kumar, K. P., & Vuppala, A. K. (2015). Detection of emotionally significant regions of speech for emotion recognition. In India conference (INDICON), 2015 Annual IEEE, pp. 1–6.
Vydana, H. K., & Vuppala, A. K. (2016). Detection of fricatives using s-transform. The Journal of the Acoustical Society of America, 140(5), 3896–3907.
Article Google Scholar
Yadav, J., & Rao, K. S. (2013). Detection of vowel offset point from speech signal. IEEE Signal Processing Letters, 20(4), 299–302.
Article Google Scholar
Yegnanarayana, B., Prasanna, S. M. & Guruprasad, S. (2011). Study of robustness of zero frequency resonator method for extraction of fundamental frequency. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5392–5395.
Yegnanarayana, B., & Murty, K. S. R. (2009). Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 614–624.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Speech Processing Lab, KCIS, International Institute of Information Technology, Hyderabad (IIIT-H), Hyderabad, India
Ramakrishna Thirumuru & Anil Kumar Vuppala

Authors

Ramakrishna Thirumuru
View author publications
You can also search for this author in PubMed Google Scholar
Anil Kumar Vuppala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ramakrishna Thirumuru.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thirumuru, R., Vuppala, A.K. Application of non-negative frequency-weighted energy operator for vowel region detection. Int J Speech Technol 21, 279–291 (2018). https://doi.org/10.1007/s10772-018-9505-x

Download citation

Received: 22 November 2017
Accepted: 29 March 2018
Published: 10 April 2018
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10772-018-9505-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application of non-negative frequency-weighted energy operator for vowel region detection

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Milestones in speaker recognition

Chinese dialect speech recognition: a comprehensive survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Application of non-negative frequency-weighted energy operator for vowel region detection

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Milestones in speaker recognition

Chinese dialect speech recognition: a comprehensive survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation