Vowel Onset Point Detection in Hindi Language Using Long Short-Term Memory

  • Arpan JainEmail author
  • Amandeep Singh
  • Anupam Shukla
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 862)


In this paper, we have discussed about the Vowel Onset Point (VOP) for the Hindi language and its significance in the speech recognition. We have defined the vowel onset point and how it can be calculated. Alphabets in Hindi language are the combination of the vowel and consonant part. In Hindi, we cannot pronounce a consonant without a vowel. There is a very small region between consonant and vowel where transition happens from consonant to vowel. We have used characteristics of the sound files to get the vowel onset point. To calculate Vowel Onset Point, we have applied filtration process, and after that, we can use energy of the signal and different formants combined with epoch interval and Itakura distance. Filtered energy and filtered formants can be used as cues for accurately detecting VOP within the range of \({+}/{-}30\) ms. In order to further increase the effectiveness of the proposed method, we have used Recurrent Neural Network variants to detect VOP which uses speech features and reference point calculated by filtered formants.


VOP Filter formants Filtered energy RNN LSTM 


  1. 1.
    A.K. Vuppala, J. Yadav, S. Chakrabarti, K.S. Rao, Vowel onset point detection for low bit rate coded speech. IEEE Trans. Audio Speech Lang. Process. 20(6), 1894–1903 (2012)CrossRefGoogle Scholar
  2. 2.
    D.J. Hermes, Vowel-onset detection. J. Acoust. Soc. Am. 87(2), 866–873 (1990)CrossRefGoogle Scholar
  3. 3.
    J.F. Wang, S.-H. Chen, Ac/v segmentation algorithm for mandarin speech signal based on wavelet transforms, in 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1999. Proceedings., vol. 1, March 1999, pp. 417–420Google Scholar
  4. 4.
    A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved vowel onset point detection using epoch intervals. AEU - Int. J. Electron. Commun. 66(8), 697–700 (2012) [Online]. Available Scholar
  5. 5.
    S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Detection of vowel onset points in continuous speech using autoassociative neural network models, in Proceedings of the International Conference on Spoken Language Processing, pp. 401–410 (2004)Google Scholar
  6. 6.
    S.M. Prasanna, B.S. Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process. 17(4), 556–565 (2009)CrossRefGoogle Scholar
  7. 7.
    L. Rabiner, R. Schafer, Digital speech processing. Froehlich/Kent Encycl. Telecommun. 6, 237–258 (2011)Google Scholar
  8. 8.
    K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)CrossRefGoogle Scholar
  9. 9.
    H.H. Bauschke, R.S. Burachik, P.L. Combettes, V. Elser, D.R. Luke, H. Wolkowicz, Fixed-point algorithms for inverse problems in science and engineering. Springer Optimization and Its Applications (2011)Google Scholar
  10. 10.
    B. Yegnanarayana, R.N. Veldhuis, Extraction of vocal-tract system characteristics from speech signals. IEEE Trans. Speech Audio Process. 6(4), 313–327 (1998)CrossRefGoogle Scholar
  11. 11.
    T. Mellahi, R. Hamdi, LPC-based formant enhancement method in Kalman filtering for speech enhancement, AEU - Int. J. Electron. Commun. 69(2), 545–554 (2015) [Online]. Available Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Indian Institute of Information Technology and Management GwaliorGwaliorIndia

Personalised recommendations