Prosodic word boundary detection from Bengali continuous speech

Abstract

Detection of word boundaries in continuous speech is a tedious process due to the absence of a definite pause or silence in the word boundary position. Thus, continuous speech recognition is a very challenging task. However, the prosodic word boundaries, unlike the written word boundaries, can be predicted using the prosodic parameters of continuous speech. This paper proposes a method for detecting such prosodic word boundaries from Bengali continuous speech. Bengali is a bound-stress language, where stress is observed on the first syllable of a prosodic word. Empirical Mode Decomposition is applied to the logarithm of fundamental frequency (F0) contour of continuous speech to detect prosodic word boundaries. 200 Bengali readout sentences, read by ten speakers, are analyzed for the present work. An overall prosodic boundary detection accuracy of 88.05% is achieved, whereas precision and recall values are 90.73% and 88.31%, respectively, with f-score as 89.5. A prosodic word dictionary comprising 5031 prosodic words has been developed by analyzing 1526 Bengali sentences with the proposed prosodic word boundary detection method.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

References

  1. Acharya, S., & Das Mandal, S. K. (2013). Prosodic word and phrase boundary detection based on F0 contour analysis using empirical mode decomposition. In Oriental COCOSDA/CASLRE (pp. 1–5). IEEE.

  2. Agarwal, A., Jain, A., Prakash, N., & Agarwal, S. (2010). Word boundary detection in continuous speech based on supra segmental features for Hindi Language. In 2nd International Conference on Signal Processing Systems (pp. 591–594). Dalian: IEEE.

  3. Alam, F., Murtoza Habib, S., Sultana, A., & Khan, M. (2010). Development of annotated bangla speech corpora. In Spoken Languages Technologies for Under-Resourced Languages (pp. 35–41).

  4. Ananthakrishnan, S., & Narayanan, S. (2007). Improved speech recognition using acoustic and lexical correlates of pitch accent in a n-best rescoring framework. In International Conference on Acoustics, Speech and Signal ProcessingICASSP’07 (pp. 873–876). Honolulu: IEEE.

  5. Ananthakrishnan, S., & Narayanan, S. (2009). Unsupervised adaptation of categorical prosody models for prosody labeling and speech recognition. IEEE Transactions on Audio, Speech and Language Processing,17(1), 138–149.

    Article  Google Scholar 

  6. Artstein, R., & Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4), 555–596.

    Article  Google Scholar 

  7. Bhowmik, T. (2017). Prosodic and Phonological Feature based Speech Recognition System for Bengali, PhD Thesis, IIT Kharagpur.

  8. Bhowmik, T., & Das Mandal, S. K. (2018). Manner of articulation based Bengali phoneme classification. International Journal of Speech Technology,21(2), 233–250.

    Article  Google Scholar 

  9. Boersma, P., & Weenink, D. (2016). Praat: Doing phonetics by computer.[computer program]. Version 6.0.19. Retrieved 2016, from http://www.praat.org.

  10. Campbell, N. (1993). Automatic detection of prosodic boundaries in speech. Speech Communication,13(3–4), 343–354.

    Article  Google Scholar 

  11. Campbell, N., & Black, A. (1997). Prosody and the selection of source units for concatenative synthesis. Progress in speech synthesis (pp. 279–292). New York: Springer.

    Google Scholar 

  12. Chen, S.-H., Yang, J.-H., Chiang, C.-Y., Liu, M.-C., & Wang, Y.-R. (2012). A new prosody-assisted mandarin ASR system. IEEE Transactions on Audio, Speech and Language Processing,20(6), 1669–1684.

    Article  Google Scholar 

  13. Das Mandal, S. K. (2007). Role of Shape Parameters in Speech Recognition: A study on standard colloquial Bengali (SCB), PhD Thesis, Jadavpur University, Kolkata, India.

  14. Das Mandal, S., Gupta, B., & Datta, A. (2007). Word boundary detection based on suprasegmental features: A case study on Bangla speech. International Journal of Speech Technology,9(1–2), 17–28.

    Article  Google Scholar 

  15. Das Mandal, S., Saha, A., & Datta, A. (2005). Annotated speech corpora development in Indian languages. Vishwa Bharat,6, 49–64.

    Google Scholar 

  16. Das Mandal, S., Warsi, A., Basu, T., Hirose, K., & Fujisaki, H. (2010). Analysis and synthesis of F0 contours for Bangla readout speech. In Oriental COCOSDA (pp. 1–6). Kathmandu: IEEE.

  17. Fujii, K., Kashioka, H., & Campbell, N. (2003). Target cost of FQ based on polynomial regression in concatenative speech synthesis. In 15th international congress of phonetic sciences (ICPhS-15) (pp. 2577–2580). Barcelona.

  18. Fujisaki, H. (1997). Prosody, models, and spontaneous speech. Computing Prosody (pp. 27–42). New York: Springer.

    Google Scholar 

  19. Fujisaki, H. (2004). Information, prosody, and modeling -with emphasis on tonal features of speech. In Speech Prosody (pp. 1–10). Nara, Japan: ISCA.

  20. Fujisaki, H., & Kawai, H. (1988). Realization of linguistic information in the voice fundamental frequency contour of the spoken Japanese. In International Conference on Acoustic, Speech, and Signal Processing-ICASSP’88 (pp. 663–666). New York: IEEE.

  21. Ganguly, N. R., Datta, A. K., & Mukherjee, B. (1998). Acoustic correlates of perceptual stress in Bengali text reading. In International conference on Computational Linguistics, Speech and Document Processing, (pp. B68–B71). ISI Calcutta.

  22. Hayes, B., & Lahiri, A. (1991). Bengali intonational phonology. Natural Language & Linguistic Theory,9(1), 47–96.

    Article  Google Scholar 

  23. Hirose, K., & Minematsu, N. (2004). Use of prosodic features for speech recognition. In INTERSPEECH (pp. 1445–1448). Jeju Island, Korea: ISCA.

  24. Huang, N., Shen, Z., Long, S., Wu, M., Shih, H., Zheng, Q., et al. (1998). The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences,454(1971), 903–995.

    Article  Google Scholar 

  25. Iwano, K., & Hirose, K. (1999). Prosodic word boundary detection using statistical modeling of moraic fundamental frequency contours and its use for continuous speech recognition. In International Conference on Acoustics, Speech, and Signal ProcessingICASSP’99 (pp. 133–136). Phoenix: IEEE.

  26. Lehiste, I., & Lass, N. (1976). Suprasegmental features of speech. In N. Lass (Ed.), Contemporary issues in experimental phonetics (pp. 225–239). New York: Academic Press.

    Google Scholar 

  27. Milone, D., & Rubio, A. (2003). Prosodic and accentual information for automatic speech recognition. IEEE Transaction on Speech and Audio Processing,11(4), 321–333.

    Article  Google Scholar 

  28. Narusawa, S., Minematsu, N., Hirose, K., & Fujisaki, H. (2002). A method for automatic extraction of model parameters from fundamental frequency contours of speech. In 2002 IEEE International conference on acoustics, speech, and signal processing (Vol. 1, pp. 506–509). Orlando, Florida: IEEE.

    Google Scholar 

  29. Rajendran, S., & Yegnanarayana, B. (1996). Word boundary hypothesization for continuous speech in Hindi based on F0 patterns. Speech Communication,18(1), 21–46.

    Article  Google Scholar 

  30. Rilling, G., Flandrin, P., & Goncalves, P. (2003). On empirical mode decomposition and its algorithms. In IEEE-EURASIP workshop on nonlinear signal and image processing (Vol. 3, pp. 8–11). NSIP-03, Grado (I).

  31. Sagisaka, Y., Campbell, N., & Higuchi, N. (2012). Computing PROSODY: Computational models for processing spontaneous speech. Kyoto: Springer Science and Business Media.

    Google Scholar 

  32. Tsiartas, A., Ghosh, P., Georgiou, P., & Narayanan, S. (2009). Robust word boundary detection in spontaneous speech using acoustic and lexical cues. In International Conference on Acoustics, Speech, and Signal ProcessingICASSP’2009 (pp. 4785–4788). Taipei, Taiwan: IEEE.

  33. Vergyri, D., Stolcke, A., Gadde, V., Ferrer, L., & Shriberg, E. (2003). Prosodic knowledge sources for automatic speech recognition. In International Conference on Acoustics, Speech, and Signal ProcessingICASSP’2003 (pp. I–I). Honk Kong: IEEE.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Tanmay Bhowmik.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bhowmik, T., Das Mandal, S.K. Prosodic word boundary detection from Bengali continuous speech. Lang Resources & Evaluation 54, 747–765 (2020). https://doi.org/10.1007/s10579-019-09478-0

Download citation

Keywords

  • Prosodic word boundaries
  • Fundamental frequency
  • F0 contour
  • Accent command
  • Onset
  • Offset