Abstract
Speech end point detection is the process of identifying speech boundary by digital processing technique. The performance of many of the speech processing applications largely depends on accurate end point detection. In this paper, we try to address this important issue and proposed an algorithm to identify the speech boundary. The algorithm based on frame-wise pitch and energy estimation to detect the onset and the terminus of an utterance. The performance of proposed algorithm has been evaluated for three databases and results were compared with the three state of art technique of end point detection. Experimental results reveal the validity of the proposed method and prove the significant improvement in end point detection over other techniques under observation. An accuracy of 71 to 87.6% in start point detection and 59 to 76.6% in end (termination) point detection is achieved by proposed Pitch and Energy based Detection (PED) for ±60 ms resolution window. In terms of error in detection, an average improvement of 26.9 ms in start point and 200.5 ms in end point is attained in compare to other methods for different speech corpus. This investigation clearly indicates that the PED technique offers superior results in terms of accuracy and error in detection for different data conditions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Campbell, J.: Speaker recognition: a tutorial. Proc. IEEE 85, 1437–1462 (1997)
Karray, L., Martin, A.: Towards improving speech detection robustness for speech recognition in adverse conditions. Speech Commun. 40, 261–276 (2003)
Sangwan, A., Chiranth, M., Jamadagni, H., Sah, R., Venkatesha Prasad, R., Gaurav, V.: VAD techniques for real-time speech transmission on the internet. In: 5th IEEE International Conference on High Speed Networks and Multimedia Communication, pp. 46–50 (2002)
Saha, G., Chakroborty, S., Senapati, S.: A new silence removal and endpoint detection algorithm for speech and speaker recognition applications. In: Proceedings of the 11th National Conference on Communications (NCC), pp. 291–295 (2005)
Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29, 254–272 (1981)
Prasanna, S.R.M., Zachariah, J., Yegnanarayana, B.: Begin-end detection using vowel onset points. In: Proceedings of Workshop on Spoken Language Processing, TIFR, Mumbai (2003)
Rabiner, L., Juang, B.: Fundamentals of Speech Recognition. Pearson Education, Delhi (2005)
Yegnanarayana, B., Prasanna, S., Zachariah, J., Gupta, C.: Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans. Speech Audio Process. 13, 575–582 (2005)
Lamel, L., Rabiner, L., Rosenberg, A., Wilpon, J.: An improved endpoint detector for isolated word recognition. IEEE Trans. Acoust. Speech Signal Process. 29, 777–785 (1981)
Rabiner, L., Sambur, M.: An algorithm for determining the endpoints of isolated utterances. Bell Syst. Tech. J. 54, 297–315 (1975)
Shen, J., Hung, J., Lee, L.: Robust entropy-based endpoint detection for speech recognition in noisy environments. In: 5th International Conference on Spoken Language Processing (ICSLP 1998). ISCA Archive, Sydney (1998)
Wu, G., Lin, C.: Word boundary detection with mel-scale frequency bank in noisy environment. IEEE Trans. Speech Audio Process. 8, 541–554 (2000)
Sohn, J., Kim, N., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6, 1–3 (1999)
Aghajani, K., Manzuri, M., Karami, M., Tayebi, H.: A robust voice activity detection based on wavelet transform. In: Second International Conference on Electrical Engineering, pp. 1–5 (2008)
Savoji, M.: A robust algorithm for accurate endpointing of speech signals. Speech Commun. 8, 45–60 (1989)
Pradhan, G.: Speaker verification under degraded conditions using vowel-like and nonvowel-like regions (2013)
Hautam, V., Tuononen, M., Niemi-Laitinen, T., Fränti, P.: Improving speaker verification by periodicity based voice activity detection. In: Proceedings of 12th International Conference on Speech and Computer (SPECOM 2007), pp. 645–650 (2007)
Jia, C., Xu, B.: An improved entropy-based endpoint detection algorithm. In: International Symposium on Chinese Spoken Language Processing (ISCSLP), Beijing (2002)
Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2001)
Sarma, V., Venugopal, D.: Studies on pattern recognition approach to voiced-unvoiced-silence classification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1978, vol. 3, pp. 1–4 (1978)
Rabiner, L., Cheng, M., Rosenberg, A., McGonegal, C.: A comparative performance study of several pitch detection algorithms. IEEE Trans. Acoust. Speech Signal Process. 24, 399–418 (1976)
Benesty, J., Sondhi, M.M., Huang, Y.A. (eds.): Springer Handbook of Speech Processing. SH. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9
Freeman, D., Boyd, I.: Voice Activity Detection, US Patent, Patent No. US 276765 A (1994)
Rama Murty, K., Yegnanarayana, B., Anand Joseph, M.: Characterization of glottal activity from speech signals. IEEE Signal Process. Lett. 16, 469–472 (2009)
Kondoz, A.: Digital Speech: Coding for Low Bit Rate Communication Systems. Wiley, New York (2005)
Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Kleijn, W., Paliwal, K. (eds.) Speech Coiling and Synthesis, pp. 495–518. Elsevier Science (1995)
Cole, R., Noel, M., Noel, V.: The CSLU speaker recognition corpus. In: Fifth International Conference on Spoken Language Processing, pp. 3167–3170 (1998)
Haris, B.C., Pradhan, G., Misra, A., Prasanna, S.R.M., Das, R.K., Sinha, R.: Multivariability speaker recognition database in Indian scenario. Int. J. Speech Technol. 15, 441–453 (2012). https://doi.org/10.1007/s10772-012-9140-x
Das, R., Jelil, S., Prasanna, S.: Multi-style speaker recognition database in practical conditions. Int. J. Speech Technol. 21, 409–419 (2017)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shome, N., Laskar, R.H., Kashyap, R., Bandyopadhyay, S. (2020). A Robust Technique for End Point Detection Under Practical Environment. In: Bhattacharjee, A., Borgohain, S., Soni, B., Verma, G., Gao, XZ. (eds) Machine Learning, Image Processing, Network Security and Data Sciences. MIND 2020. Communications in Computer and Information Science, vol 1241. Springer, Singapore. https://doi.org/10.1007/978-981-15-6318-8_12
Download citation
DOI: https://doi.org/10.1007/978-981-15-6318-8_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6317-1
Online ISBN: 978-981-15-6318-8
eBook Packages: Computer ScienceComputer Science (R0)