A Robust Technique for End Point Detection Under Practical Environment

Shome, Nirupam; Laskar, Rabul Hussain; Kashyap, Richik; Bandyopadhyay, Sivaji

doi:10.1007/978-981-15-6318-8_12

Nirupam Shome¹¹,
Rabul Hussain Laskar¹²,
Richik Kashyap¹¹ &
…
Sivaji Bandyopadhyay¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1241))

Included in the following conference series:

International Conference on Machine Learning, Image Processing, Network Security and Data Sciences

1131 Accesses
2 Citations

Abstract

Speech end point detection is the process of identifying speech boundary by digital processing technique. The performance of many of the speech processing applications largely depends on accurate end point detection. In this paper, we try to address this important issue and proposed an algorithm to identify the speech boundary. The algorithm based on frame-wise pitch and energy estimation to detect the onset and the terminus of an utterance. The performance of proposed algorithm has been evaluated for three databases and results were compared with the three state of art technique of end point detection. Experimental results reveal the validity of the proposed method and prove the significant improvement in end point detection over other techniques under observation. An accuracy of 71 to 87.6% in start point detection and 59 to 76.6% in end (termination) point detection is achieved by proposed Pitch and Energy based Detection (PED) for ±60 ms resolution window. In terms of error in detection, an average improvement of 26.9 ms in start point and 200.5 ms in end point is attained in compare to other methods for different speech corpus. This investigation clearly indicates that the PED technique offers superior results in terms of accuracy and error in detection for different data conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Campbell, J.: Speaker recognition: a tutorial. Proc. IEEE 85, 1437–1462 (1997)
Article Google Scholar
Karray, L., Martin, A.: Towards improving speech detection robustness for speech recognition in adverse conditions. Speech Commun. 40, 261–276 (2003)
Article Google Scholar
Sangwan, A., Chiranth, M., Jamadagni, H., Sah, R., Venkatesha Prasad, R., Gaurav, V.: VAD techniques for real-time speech transmission on the internet. In: 5th IEEE International Conference on High Speed Networks and Multimedia Communication, pp. 46–50 (2002)
Google Scholar
Saha, G., Chakroborty, S., Senapati, S.: A new silence removal and endpoint detection algorithm for speech and speaker recognition applications. In: Proceedings of the 11th National Conference on Communications (NCC), pp. 291–295 (2005)
Google Scholar
Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29, 254–272 (1981)
Article Google Scholar
Prasanna, S.R.M., Zachariah, J., Yegnanarayana, B.: Begin-end detection using vowel onset points. In: Proceedings of Workshop on Spoken Language Processing, TIFR, Mumbai (2003)
Google Scholar
Rabiner, L., Juang, B.: Fundamentals of Speech Recognition. Pearson Education, Delhi (2005)
Google Scholar
Yegnanarayana, B., Prasanna, S., Zachariah, J., Gupta, C.: Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans. Speech Audio Process. 13, 575–582 (2005)
Article Google Scholar
Lamel, L., Rabiner, L., Rosenberg, A., Wilpon, J.: An improved endpoint detector for isolated word recognition. IEEE Trans. Acoust. Speech Signal Process. 29, 777–785 (1981)
Article Google Scholar
Rabiner, L., Sambur, M.: An algorithm for determining the endpoints of isolated utterances. Bell Syst. Tech. J. 54, 297–315 (1975)
Article Google Scholar
Shen, J., Hung, J., Lee, L.: Robust entropy-based endpoint detection for speech recognition in noisy environments. In: 5th International Conference on Spoken Language Processing (ICSLP 1998). ISCA Archive, Sydney (1998)
Google Scholar
Wu, G., Lin, C.: Word boundary detection with mel-scale frequency bank in noisy environment. IEEE Trans. Speech Audio Process. 8, 541–554 (2000)
Article Google Scholar
Sohn, J., Kim, N., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6, 1–3 (1999)
Article Google Scholar
Aghajani, K., Manzuri, M., Karami, M., Tayebi, H.: A robust voice activity detection based on wavelet transform. In: Second International Conference on Electrical Engineering, pp. 1–5 (2008)
Google Scholar
Savoji, M.: A robust algorithm for accurate endpointing of speech signals. Speech Commun. 8, 45–60 (1989)
Article Google Scholar
Pradhan, G.: Speaker verification under degraded conditions using vowel-like and nonvowel-like regions (2013)
Google Scholar
Hautam, V., Tuononen, M., Niemi-Laitinen, T., Fränti, P.: Improving speaker verification by periodicity based voice activity detection. In: Proceedings of 12th International Conference on Speech and Computer (SPECOM 2007), pp. 645–650 (2007)
Google Scholar
Jia, C., Xu, B.: An improved entropy-based endpoint detection algorithm. In: International Symposium on Chinese Spoken Language Processing (ISCSLP), Beijing (2002)
Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2001)
MATH Google Scholar
Sarma, V., Venugopal, D.: Studies on pattern recognition approach to voiced-unvoiced-silence classification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1978, vol. 3, pp. 1–4 (1978)
Google Scholar
Rabiner, L., Cheng, M., Rosenberg, A., McGonegal, C.: A comparative performance study of several pitch detection algorithms. IEEE Trans. Acoust. Speech Signal Process. 24, 399–418 (1976)
Article Google Scholar
Benesty, J., Sondhi, M.M., Huang, Y.A. (eds.): Springer Handbook of Speech Processing. SH. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9
Book Google Scholar
Freeman, D., Boyd, I.: Voice Activity Detection, US Patent, Patent No. US 276765 A (1994)
Google Scholar
Rama Murty, K., Yegnanarayana, B., Anand Joseph, M.: Characterization of glottal activity from speech signals. IEEE Signal Process. Lett. 16, 469–472 (2009)
Article Google Scholar
Kondoz, A.: Digital Speech: Coding for Low Bit Rate Communication Systems. Wiley, New York (2005)
Google Scholar
Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Kleijn, W., Paliwal, K. (eds.) Speech Coiling and Synthesis, pp. 495–518. Elsevier Science (1995)
Google Scholar
Cole, R., Noel, M., Noel, V.: The CSLU speaker recognition corpus. In: Fifth International Conference on Spoken Language Processing, pp. 3167–3170 (1998)
Google Scholar
Haris, B.C., Pradhan, G., Misra, A., Prasanna, S.R.M., Das, R.K., Sinha, R.: Multivariability speaker recognition database in Indian scenario. Int. J. Speech Technol. 15, 441–453 (2012). https://doi.org/10.1007/s10772-012-9140-x
Article Google Scholar
Das, R., Jelil, S., Prasanna, S.: Multi-style speaker recognition database in practical conditions. Int. J. Speech Technol. 21, 409–419 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Assam University, Silchar, 788011, Assam, India
Nirupam Shome & Richik Kashyap
National Institute of Technology, Silchar, 788010, Assam, India
Rabul Hussain Laskar & Sivaji Bandyopadhyay

Authors

Nirupam Shome
View author publications
You can also search for this author in PubMed Google Scholar
Rabul Hussain Laskar
View author publications
You can also search for this author in PubMed Google Scholar
Richik Kashyap
View author publications
You can also search for this author in PubMed Google Scholar
Sivaji Bandyopadhyay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Nirupam Shome , Rabul Hussain Laskar or Richik Kashyap .

Editor information

Editors and Affiliations

National Institute of Technology Silchar, Silchar, India
Arup Bhattacharjee
National Institute Of Technology Silchar, Silchar, India
Samir Kr. Borgohain
National Institute of Technology Silchar, Silchar, India
Badal Soni
National Institute of Technology Kurukshetra, Kurukshetra, India
Gyanendra Verma
University of Eastern Finland, Kuopio, Finland
Xiao-Zhi Gao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shome, N., Laskar, R.H., Kashyap, R., Bandyopadhyay, S. (2020). A Robust Technique for End Point Detection Under Practical Environment. In: Bhattacharjee, A., Borgohain, S., Soni, B., Verma, G., Gao, XZ. (eds) Machine Learning, Image Processing, Network Security and Data Sciences. MIND 2020. Communications in Computer and Information Science, vol 1241. Springer, Singapore. https://doi.org/10.1007/978-981-15-6318-8_12

Download citation

DOI: https://doi.org/10.1007/978-981-15-6318-8_12
Published: 15 June 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6317-1
Online ISBN: 978-981-15-6318-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics