Skip to main content
Log in

Application of non-negative frequency-weighted energy operator for vowel region detection

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, a novel technique has been proposed for the vowel region detection from the continuous speech using an envelope of the derivative of the speech signal, which is a non-negative, frequency-weighted energy operator. The proposed vowel region detection method is implemented using a two-stage algorithm. The first stage of vowel region detection consists of speech signal analysis to detect vowel onset points (VOP) and vowel end-points (VEP) using an instantaneous energy contour obtained from the envelope of the derivative of a speech signal. The VOPs and VEPs are spotted using the peak-finding algorithm based upon the first order Gaussian differentiator. The next stage consists of removal of spurious vowel regions and the correction of hypothesized VOP and VEP locations using combined cues obtained from the uniformity of epoch intervals and strength of the excitation of the speech signal. Performance of the proposed method for detecting vowel regions from the speech signal is evaluated using TIMIT acoustic-phonetic speech corpus. The proposed approach resulted in significantly high detection rate and less false alarm rate compared to the state-of-the-art methods in both clean and noisy environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Ananthapadmanabha, T., & Yegnanarayana, B. (1979). Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(4), 309–319.

    Article  Google Scholar 

  • Deller, J. R, Jr., Proakis, J. G., & Hansen, J. H. (1993). Discrete time processing of speech signals. Englewood Cliffs: Prentice Hall PTR.

    Google Scholar 

  • Donaldson, G. S., Rogers, C. L., Cardenas, E. S., Russell, B. A., & Hanna, N. H. (2013). Vowel identification by cochlear implant users: Contributions of static and dynamic spectral cues. The Journal of the Acoustical Society of America, 134(4), 3021–3028.

    Article  Google Scholar 

  • Dumpala, S. H., Nellore, B. T., Nevali, R. R., Gangashetty, S. V., & Yegnanarayana, B. (2016). Robust vowel landmark detection using epoch-based features. In INTERSPEECH (pp. 160–164).

  • Fant, G. (1971). Acoustic theory of speech production: With calculations based on X-ray studies of Russian articulations. Berlin: Walter de Gruyter.

    Book  Google Scholar 

  • Gangamohan, P., Kadiri, S. R., Gangashetty, S. V., & Yegnanarayana, B. (2014). Excitation source features for discrimination of anger and happy emotions. In Fifteenth annual conference of the International Speech Communication Association.

  • Glass, J. R. (2003). A probabilistic framework for segment-based speech recognition. Computer Speech & Language, 17(2), 137–152.

    Article  Google Scholar 

  • Hansen, J. H., Gray, S. S., & Kim, W. (2010). Automatic voice onset time detection for unvoiced stops (/p/,/t/,/k/) with application to accent classification. Speech Communication, 52(10), 777–789.

    Article  Google Scholar 

  • Hermes, D. J. (1990). Vowel-onset detection. The Journal of the Acoustical Society of America, 87(2), 866–873.

    Article  Google Scholar 

  • Johnson, K. (2004). Acoustic and auditory phonetics. Phonetica, 61(1), 56–58.

    Article  Google Scholar 

  • Juneja, A., & Espy-Wilson, C. (2008). A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition. The Journal of the Acoustical Society of America, 123(2), 1154–1168.

    Article  Google Scholar 

  • Kaiser, J. F. (1990). On a simple algorithm to calculate the ’energy’ of a signal. In Proceedings of the 1990 international conference on acoustics, speech, and signal processing (ICASSP-90), pp. 381–384.

  • Kaiser, J. F. (1993). Some useful properties of Teager’s energy operators. In Proceedings of the 18th IEEE international conference on acoustics, speech, and signal processing (ICASSP '93), vol. 3, pp. 149–152.

  • Kashani, H. B., Sayadiyan, A., & Sheikhzadeh, H. (2017). Vowel detection using a perceptually-enhanced spectrum matching conditioned to phonetic context and speaker identity. Speech Communication, 91, 28–48.

    Article  Google Scholar 

  • Kumar, A., Shahnawazuddin, S., & Pradhan, G. (2017). Improvements in the detection of vowel onset and offset points in a speech sequence. Circuits, Systems, and Signal Processing, 36(6), 2315–2340.

    Article  MathSciNet  Google Scholar 

  • Liu, S. A. (1996). Landmark detection for distinctive feature-based speech recognition. The Journal of the Acoustical Society of America, 100(5), 3417–3430.

    Article  Google Scholar 

  • Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561–580.

    Article  Google Scholar 

  • Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613.

    Article  Google Scholar 

  • O’Toole, J. M., Temko, A., & Stevenson, N. (2014). Assessing instantaneous energy in the EEG: A non-negative, frequency-weighted energy operator. In Engineering in Medicine and Biology Society (EMBC), 2014 36th annual international conference of the IEEE, pp. 3288–3291.

  • Palmu, K., Stevenson, N., Wikström, S., Hellström-Westas, L., Vanhatalo, S., & Palva, J. M. (2010). Optimization of an nleo-based algorithm for automated detection of spontaneous activity transients in early preterm EEG. Physiological Measurement, 31(11), N85.

    Article  Google Scholar 

  • Pradhan, G., & Prasanna, S. M. (2013). Speaker verification by vowel and nonvowel like segmentation. IEEE Transactions on Audio, Speech, and Language Processing, 21(4), 854–867.

    Article  Google Scholar 

  • Prasanna, S. M. & Yegnanarayana, B. (2005). Detection of vowel onset point events using excitation information. In Ninth European conference on speech communication and technology.

  • Prasanna, S. M., & Pradhan, G. (2011). Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2552–2565.

    Article  Google Scholar 

  • Prasanna, S. M., Reddy, B. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 556–565.

    Article  Google Scholar 

  • Rao, K. S., & Yegnanarayana, B. (2009). Duration modification using glottal closure instants and vowel onset points. Speech Communication, 51(12), 1263–1269.

    Article  Google Scholar 

  • Rose, P. (2003). Forensic speaker identification. Boca Raton: CRC Press.

    Google Scholar 

  • Saha, P., Laskar, R. H., & Laskar, A. (2016). A pre-processing method for improvement of vowel onset point detection under noisy conditions. Speech Communication, 80, 71–83.

    Article  Google Scholar 

  • Salomon, A., Espy-Wilson, C. Y., & Deshmukh, O. (2004). Detection of speech landmarks: Use of temporal information. The Journal of the Acoustical Society of America, 115(3), 1296–1305.

    Article  Google Scholar 

  • Schutte, K., & Glass, J., (2005). Robust detection of sonorant landmarks. In Ninth European conference on speech communication and technology.

  • Stevens, K. N. (2000). Acoustic phonetics. Cambridge: MIT Press.

    Google Scholar 

  • Teager, H., & Teager, S. (1990). Evidence for nonlinear sound production mechanisms in the vocal tract. Speech Production and Speech Modelling, 55, 241–261.

    Article  Google Scholar 

  • Vuppala, A. K., & Rao, K. S. (2013). Vowel onset point detection for noisy speech using spectral energy at formant frequencies. International Journal of Speech Technology, 16(2), 229–235.

    Article  Google Scholar 

  • Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2012). Improved vowel onset point detection using epoch intervals. AEU-International Journal of Electronics and Communications, 66(8), 697–700.

    Article  Google Scholar 

  • Vuppala, A. K., Yadav, J., Chakrabarti, S., & Rao, K. S. (2012). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1894–1903.

    Article  Google Scholar 

  • Vydana, H. K., Vikash, P., Vamsi, T., Kumar, K. P., & Vuppala, A. K. (2015). Detection of emotionally significant regions of speech for emotion recognition. In India conference (INDICON), 2015 Annual IEEE, pp. 1–6.

  • Vydana, H. K., & Vuppala, A. K. (2016). Detection of fricatives using s-transform. The Journal of the Acoustical Society of America, 140(5), 3896–3907.

    Article  Google Scholar 

  • Yadav, J., & Rao, K. S. (2013). Detection of vowel offset point from speech signal. IEEE Signal Processing Letters, 20(4), 299–302.

    Article  Google Scholar 

  • Yegnanarayana, B., Prasanna, S. M. & Guruprasad, S. (2011). Study of robustness of zero frequency resonator method for extraction of fundamental frequency. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5392–5395.

  • Yegnanarayana, B., & Murty, K. S. R. (2009). Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 614–624.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ramakrishna Thirumuru.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thirumuru, R., Vuppala, A.K. Application of non-negative frequency-weighted energy operator for vowel region detection. Int J Speech Technol 21, 279–291 (2018). https://doi.org/10.1007/s10772-018-9505-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-018-9505-x

Keywords

Navigation