A Robust Technique for End Point Detection Under Practical Environment

  • Nirupam ShomeEmail author
  • Rabul Hussain LaskarEmail author
  • Richik KashyapEmail author
  • Sivaji Bandyopadhyay
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1241)


Speech end point detection is the process of identifying speech boundary by digital processing technique. The performance of many of the speech processing applications largely depends on accurate end point detection. In this paper, we try to address this important issue and proposed an algorithm to identify the speech boundary. The algorithm based on frame-wise pitch and energy estimation to detect the onset and the terminus of an utterance. The performance of proposed algorithm has been evaluated for three databases and results were compared with the three state of art technique of end point detection. Experimental results reveal the validity of the proposed method and prove the significant improvement in end point detection over other techniques under observation. An accuracy of 71 to 87.6% in start point detection and 59 to 76.6% in end (termination) point detection is achieved by proposed Pitch and Energy based Detection (PED) for ±60 ms resolution window. In terms of error in detection, an average improvement of 26.9 ms in start point and 200.5 ms in end point is attained in compare to other methods for different speech corpus. This investigation clearly indicates that the PED technique offers superior results in terms of accuracy and error in detection for different data conditions.


Voice activity detection Glottal activity detection Vowel on set point detection Pitch and energy based detection Accuracy in end point detection Error in end point detection 


  1. 1.
    Campbell, J.: Speaker recognition: a tutorial. Proc. IEEE 85, 1437–1462 (1997)CrossRefGoogle Scholar
  2. 2.
    Karray, L., Martin, A.: Towards improving speech detection robustness for speech recognition in adverse conditions. Speech Commun. 40, 261–276 (2003)CrossRefGoogle Scholar
  3. 3.
    Sangwan, A., Chiranth, M., Jamadagni, H., Sah, R., Venkatesha Prasad, R., Gaurav, V.: VAD techniques for real-time speech transmission on the internet. In: 5th IEEE International Conference on High Speed Networks and Multimedia Communication, pp. 46–50 (2002)Google Scholar
  4. 4.
    Saha, G., Chakroborty, S., Senapati, S.: A new silence removal and endpoint detection algorithm for speech and speaker recognition applications. In: Proceedings of the 11th National Conference on Communications (NCC), pp. 291–295 (2005)Google Scholar
  5. 5.
    Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29, 254–272 (1981)CrossRefGoogle Scholar
  6. 6.
    Prasanna, S.R.M., Zachariah, J., Yegnanarayana, B.: Begin-end detection using vowel onset points. In: Proceedings of Workshop on Spoken Language Processing, TIFR, Mumbai (2003)Google Scholar
  7. 7.
    Rabiner, L., Juang, B.: Fundamentals of Speech Recognition. Pearson Education, Delhi (2005)Google Scholar
  8. 8.
    Yegnanarayana, B., Prasanna, S., Zachariah, J., Gupta, C.: Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans. Speech Audio Process. 13, 575–582 (2005)CrossRefGoogle Scholar
  9. 9.
    Lamel, L., Rabiner, L., Rosenberg, A., Wilpon, J.: An improved endpoint detector for isolated word recognition. IEEE Trans. Acoust. Speech Signal Process. 29, 777–785 (1981)CrossRefGoogle Scholar
  10. 10.
    Rabiner, L., Sambur, M.: An algorithm for determining the endpoints of isolated utterances. Bell Syst. Tech. J. 54, 297–315 (1975)CrossRefGoogle Scholar
  11. 11.
    Shen, J., Hung, J., Lee, L.: Robust entropy-based endpoint detection for speech recognition in noisy environments. In: 5th International Conference on Spoken Language Processing (ICSLP 1998). ISCA Archive, Sydney (1998)Google Scholar
  12. 12.
    Wu, G., Lin, C.: Word boundary detection with mel-scale frequency bank in noisy environment. IEEE Trans. Speech Audio Process. 8, 541–554 (2000)CrossRefGoogle Scholar
  13. 13.
    Sohn, J., Kim, N., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6, 1–3 (1999)CrossRefGoogle Scholar
  14. 14.
    Aghajani, K., Manzuri, M., Karami, M., Tayebi, H.: A robust voice activity detection based on wavelet transform. In: Second International Conference on Electrical Engineering, pp. 1–5 (2008)Google Scholar
  15. 15.
    Savoji, M.: A robust algorithm for accurate endpointing of speech signals. Speech Commun. 8, 45–60 (1989)CrossRefGoogle Scholar
  16. 16.
    Pradhan, G.: Speaker verification under degraded conditions using vowel-like and nonvowel-like regions (2013)Google Scholar
  17. 17.
    Hautam, V., Tuononen, M., Niemi-Laitinen, T., Fränti, P.: Improving speaker verification by periodicity based voice activity detection. In: Proceedings of 12th International Conference on Speech and Computer (SPECOM 2007), pp. 645–650 (2007)Google Scholar
  18. 18.
    Jia, C., Xu, B.: An improved entropy-based endpoint detection algorithm. In: International Symposium on Chinese Spoken Language Processing (ISCSLP), Beijing (2002)Google Scholar
  19. 19.
    Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2001)zbMATHGoogle Scholar
  20. 20.
    Sarma, V., Venugopal, D.: Studies on pattern recognition approach to voiced-unvoiced-silence classification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1978, vol. 3, pp. 1–4 (1978)Google Scholar
  21. 21.
    Rabiner, L., Cheng, M., Rosenberg, A., McGonegal, C.: A comparative performance study of several pitch detection algorithms. IEEE Trans. Acoust. Speech Signal Process. 24, 399–418 (1976)CrossRefGoogle Scholar
  22. 22.
    Benesty, J., Sondhi, M.M., Huang, Y.A. (eds.): Springer Handbook of Speech Processing. SH. Springer, Heidelberg (2008). Scholar
  23. 23.
    Freeman, D., Boyd, I.: Voice Activity Detection, US Patent, Patent No. US 276765 A (1994)Google Scholar
  24. 24.
    Rama Murty, K., Yegnanarayana, B., Anand Joseph, M.: Characterization of glottal activity from speech signals. IEEE Signal Process. Lett. 16, 469–472 (2009)CrossRefGoogle Scholar
  25. 25.
    Kondoz, A.: Digital Speech: Coding for Low Bit Rate Communication Systems. Wiley, New York (2005)Google Scholar
  26. 26.
    Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Kleijn, W., Paliwal, K. (eds.) Speech Coiling and Synthesis, pp. 495–518. Elsevier Science (1995)Google Scholar
  27. 27.
    Cole, R., Noel, M., Noel, V.: The CSLU speaker recognition corpus. In: Fifth International Conference on Spoken Language Processing, pp. 3167–3170 (1998)Google Scholar
  28. 28.
    Haris, B.C., Pradhan, G., Misra, A., Prasanna, S.R.M., Das, R.K., Sinha, R.: Multivariability speaker recognition database in Indian scenario. Int. J. Speech Technol. 15, 441–453 (2012). Scholar
  29. 29.
    Das, R., Jelil, S., Prasanna, S.: Multi-style speaker recognition database in practical conditions. Int. J. Speech Technol. 21, 409–419 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Assam UniversitySilcharIndia
  2. 2.National Institute of TechnologySilcharIndia

Personalised recommendations