Duration Modeling Using Multi-model Based on Positional Information

Ramu Reddy, Vempada; Sreenivasa Rao, Krothapalli

doi:10.1007/978-3-642-45062-4_55

Vempada Ramu Reddy¹⁸ &
Krothapalli Sreenivasa Rao¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8251))

Included in the following conference series:

International Conference on Pattern Recognition and Machine Intelligence

1668 Accesses
2 Citations

Abstract

This paper proposes prediction of syllable durations by developing multi-models using positional information. The proposed multi-model consists of four models used for predicting the durations of syllables. Among them, one of the models is used for predicting the durations of syllables present in mono-syllabic words, and the remaining three models are meant for predicting the durations of syllables present at initial, middle and final positions of polysyllabic words. In this study, (i) linguistic constraints represented by positional, contextual and phonological features and (ii) production constraints represented by articulatory features are used for predicting the duration patterns. Feed-forward Neural Networks (FFNN) are used for developing the duration models using above mentioned features. It was found, that the prediction accuracy is improved using multi-models compared to single duration model.

Download to read the full chapter text

Chapter PDF

Automatic Syllable Repetition Detection in Continuous Speech Based on Linear Prediction Coefficients

Automatic Text-Independent Syllable Segmentation Using Singularity Exponents And Rényi Entropy

Article 07 October 2016

A Syllable Structure Approach to Spoken Language Recognition

Keywords

References

Reddy, V.R., Rao, K.S.: Better human computer interaction by enhancing the quality of text-to-speech synthesis. In: Proc. Int. Conf. Intelligent Human Computer Interaction (IHCI), IIT Kharagpur, India, pp. 1–6 (December 2012)
Google Scholar
Rao, K.S., Yegnanarayana, B.: Modeling durations of syllables using neural networks. Computer Speech and Language 21, 282–295 (2007)
Article Google Scholar
Sreenivasa Rao, K., Mahadeva Prasanna, S.R., Yegnanarayana, B.: Two-stage duration model for Indian languages using neural networks. In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds.) ICONIP 2004. LNCS, vol. 3316, pp. 1179–1185. Springer, Heidelberg (2004)
Chapter Google Scholar
Reddy, V.R., Rao, K.S.: Intonation Modeling Using Linguistic, Production and Prosodic Constraints for Syllable based TTS Systems. Procedia Engineering, Elsevier 38, 2772–2783 (2012)
Article Google Scholar
Yegnanarayana, B.: Artificial Neural Networks. Prentice-Hall, New Delhi (1999)
Google Scholar
Reddy, V.R., Rao, K.S.: Intonation Modeling using FFNN for Syllable based Bengali Text To Speech Synthesis. In: Proc. Int. Conf. Computer and Communication Technology, MNNIT, Allahabad, pp. 334–339 (2011)
Google Scholar
Rao, K.S., Yegnanarayana, B.: Intonation modeling for Indian languages. Computer Speech and Language 23, 240–256 (2009)
Article Google Scholar
Reddy, V.R., Rao, K.S.: Two-Stage Intonation Modeling Using Feedforward Neural Networks for Syllable based Text-to-Speech Synthesis. Computer Speech and Language 27, 1105–1126 (2013)
Article Google Scholar
Ramu Reddy, V., Sreenivasa Rao, K.: Intensity Modeling for Syllable Based Text-to-Speech Synthesis. In: Parashar, M., Kaushik, D., Rana, O.F., Samtaney, R., Yang, Y., Zomaya, A. (eds.) IC3 2012. CCIS, vol. 306, pp. 106–117. Springer, Heidelberg (2012)
Chapter Google Scholar
Tamura, S., Tateishi, M.: Capabilities of a Four-Layered Feedforward Neural Network: Four Layers Versus Three. 8, 251–255 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, 721302, West Bengal, India
Vempada Ramu Reddy & Krothapalli Sreenivasa Rao

Authors

Vempada Ramu Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Krothapalli Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Machine Intelligence Unit, Indian Statistical Institute, 203, B. T. Road, 700108, Kolkata, India
Pradipta Maji , Ashish Ghosh , Kuntal Ghosh & Sankar K. Pal , , &
Department of Computer Science and Automation, Indian Institute of Science, 560012, Bangalore, India
M. Narasimha Murty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ramu Reddy, V., Sreenivasa Rao, K. (2013). Duration Modeling Using Multi-model Based on Positional Information. In: Maji, P., Ghosh, A., Murty, M.N., Ghosh, K., Pal, S.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2013. Lecture Notes in Computer Science, vol 8251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45062-4_55

Download citation

DOI: https://doi.org/10.1007/978-3-642-45062-4_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45061-7
Online ISBN: 978-3-642-45062-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Duration Modeling Using Multi-model Based on Positional Information

Abstract

Chapter PDF

Similar content being viewed by others

Automatic Syllable Repetition Detection in Continuous Speech Based on Linear Prediction Coefficients

Automatic Text-Independent Syllable Segmentation Using Singularity Exponents And Rényi Entropy

A Syllable Structure Approach to Spoken Language Recognition

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Duration Modeling Using Multi-model Based on Positional Information

Abstract

Chapter PDF

Similar content being viewed by others

Automatic Syllable Repetition Detection in Continuous Speech Based on Linear Prediction Coefficients

Automatic Text-Independent Syllable Segmentation Using Singularity Exponents And Rényi Entropy

A Syllable Structure Approach to Spoken Language Recognition

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation