Skip to main content

A Robust Technique for End Point Detection Under Practical Environment

  • Conference paper
  • First Online:
Machine Learning, Image Processing, Network Security and Data Sciences (MIND 2020)

Abstract

Speech end point detection is the process of identifying speech boundary by digital processing technique. The performance of many of the speech processing applications largely depends on accurate end point detection. In this paper, we try to address this important issue and proposed an algorithm to identify the speech boundary. The algorithm based on frame-wise pitch and energy estimation to detect the onset and the terminus of an utterance. The performance of proposed algorithm has been evaluated for three databases and results were compared with the three state of art technique of end point detection. Experimental results reveal the validity of the proposed method and prove the significant improvement in end point detection over other techniques under observation. An accuracy of 71 to 87.6% in start point detection and 59 to 76.6% in end (termination) point detection is achieved by proposed Pitch and Energy based Detection (PED) for ±60 ms resolution window. In terms of error in detection, an average improvement of 26.9 ms in start point and 200.5 ms in end point is attained in compare to other methods for different speech corpus. This investigation clearly indicates that the PED technique offers superior results in terms of accuracy and error in detection for different data conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Campbell, J.: Speaker recognition: a tutorial. Proc. IEEE 85, 1437–1462 (1997)

    Article  Google Scholar 

  2. Karray, L., Martin, A.: Towards improving speech detection robustness for speech recognition in adverse conditions. Speech Commun. 40, 261–276 (2003)

    Article  Google Scholar 

  3. Sangwan, A., Chiranth, M., Jamadagni, H., Sah, R., Venkatesha Prasad, R., Gaurav, V.: VAD techniques for real-time speech transmission on the internet. In: 5th IEEE International Conference on High Speed Networks and Multimedia Communication, pp. 46–50 (2002)

    Google Scholar 

  4. Saha, G., Chakroborty, S., Senapati, S.: A new silence removal and endpoint detection algorithm for speech and speaker recognition applications. In: Proceedings of the 11th National Conference on Communications (NCC), pp. 291–295 (2005)

    Google Scholar 

  5. Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29, 254–272 (1981)

    Article  Google Scholar 

  6. Prasanna, S.R.M., Zachariah, J., Yegnanarayana, B.: Begin-end detection using vowel onset points. In: Proceedings of Workshop on Spoken Language Processing, TIFR, Mumbai (2003)

    Google Scholar 

  7. Rabiner, L., Juang, B.: Fundamentals of Speech Recognition. Pearson Education, Delhi (2005)

    Google Scholar 

  8. Yegnanarayana, B., Prasanna, S., Zachariah, J., Gupta, C.: Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans. Speech Audio Process. 13, 575–582 (2005)

    Article  Google Scholar 

  9. Lamel, L., Rabiner, L., Rosenberg, A., Wilpon, J.: An improved endpoint detector for isolated word recognition. IEEE Trans. Acoust. Speech Signal Process. 29, 777–785 (1981)

    Article  Google Scholar 

  10. Rabiner, L., Sambur, M.: An algorithm for determining the endpoints of isolated utterances. Bell Syst. Tech. J. 54, 297–315 (1975)

    Article  Google Scholar 

  11. Shen, J., Hung, J., Lee, L.: Robust entropy-based endpoint detection for speech recognition in noisy environments. In: 5th International Conference on Spoken Language Processing (ICSLP 1998). ISCA Archive, Sydney (1998)

    Google Scholar 

  12. Wu, G., Lin, C.: Word boundary detection with mel-scale frequency bank in noisy environment. IEEE Trans. Speech Audio Process. 8, 541–554 (2000)

    Article  Google Scholar 

  13. Sohn, J., Kim, N., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6, 1–3 (1999)

    Article  Google Scholar 

  14. Aghajani, K., Manzuri, M., Karami, M., Tayebi, H.: A robust voice activity detection based on wavelet transform. In: Second International Conference on Electrical Engineering, pp. 1–5 (2008)

    Google Scholar 

  15. Savoji, M.: A robust algorithm for accurate endpointing of speech signals. Speech Commun. 8, 45–60 (1989)

    Article  Google Scholar 

  16. Pradhan, G.: Speaker verification under degraded conditions using vowel-like and nonvowel-like regions (2013)

    Google Scholar 

  17. Hautam, V., Tuononen, M., Niemi-Laitinen, T., Fränti, P.: Improving speaker verification by periodicity based voice activity detection. In: Proceedings of 12th International Conference on Speech and Computer (SPECOM 2007), pp. 645–650 (2007)

    Google Scholar 

  18. Jia, C., Xu, B.: An improved entropy-based endpoint detection algorithm. In: International Symposium on Chinese Spoken Language Processing (ISCSLP), Beijing (2002)

    Google Scholar 

  19. Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2001)

    MATH  Google Scholar 

  20. Sarma, V., Venugopal, D.: Studies on pattern recognition approach to voiced-unvoiced-silence classification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1978, vol. 3, pp. 1–4 (1978)

    Google Scholar 

  21. Rabiner, L., Cheng, M., Rosenberg, A., McGonegal, C.: A comparative performance study of several pitch detection algorithms. IEEE Trans. Acoust. Speech Signal Process. 24, 399–418 (1976)

    Article  Google Scholar 

  22. Benesty, J., Sondhi, M.M., Huang, Y.A. (eds.): Springer Handbook of Speech Processing. SH. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9

    Book  Google Scholar 

  23. Freeman, D., Boyd, I.: Voice Activity Detection, US Patent, Patent No. US 276765 A (1994)

    Google Scholar 

  24. Rama Murty, K., Yegnanarayana, B., Anand Joseph, M.: Characterization of glottal activity from speech signals. IEEE Signal Process. Lett. 16, 469–472 (2009)

    Article  Google Scholar 

  25. Kondoz, A.: Digital Speech: Coding for Low Bit Rate Communication Systems. Wiley, New York (2005)

    Google Scholar 

  26. Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Kleijn, W., Paliwal, K. (eds.) Speech Coiling and Synthesis, pp. 495–518. Elsevier Science (1995)

    Google Scholar 

  27. Cole, R., Noel, M., Noel, V.: The CSLU speaker recognition corpus. In: Fifth International Conference on Spoken Language Processing, pp. 3167–3170 (1998)

    Google Scholar 

  28. Haris, B.C., Pradhan, G., Misra, A., Prasanna, S.R.M., Das, R.K., Sinha, R.: Multivariability speaker recognition database in Indian scenario. Int. J. Speech Technol. 15, 441–453 (2012). https://doi.org/10.1007/s10772-012-9140-x

    Article  Google Scholar 

  29. Das, R., Jelil, S., Prasanna, S.: Multi-style speaker recognition database in practical conditions. Int. J. Speech Technol. 21, 409–419 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Nirupam Shome , Rabul Hussain Laskar or Richik Kashyap .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shome, N., Laskar, R.H., Kashyap, R., Bandyopadhyay, S. (2020). A Robust Technique for End Point Detection Under Practical Environment. In: Bhattacharjee, A., Borgohain, S., Soni, B., Verma, G., Gao, XZ. (eds) Machine Learning, Image Processing, Network Security and Data Sciences. MIND 2020. Communications in Computer and Information Science, vol 1241. Springer, Singapore. https://doi.org/10.1007/978-981-15-6318-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-6318-8_12

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-6317-1

  • Online ISBN: 978-981-15-6318-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics