Advertisement

Sadhana

, Volume 19, Issue 1, pp 147–169 | Cite as

Significance of knowledge sources for a text-to-speech system for Indian languages

  • B Yegnanarayana
  • S Rajendran
  • V R Ramachandran
  • A S Madhukumar
Artificial Intelligence And Expert Systems

Abstract

This paper discusses the significance of segmental and prosodic knowledge sources for developing a text-to-speech system for Indian languages. Acoustic parameters such as linear prediction coefficients, formants, pitch and gain are prestored for the basic speech sound units corresponding to the orthographic characters of Hindi. The parameters are concatenated based on the input text. These parameters are modified by stored knowledge sources corresponding to coarticulation, duration and intonation. The coarticulation rules specify the pattern of joining the basic units. The duration rules modify the inherent duration of the basic units based on the linguistic context in which the units occur. The intonation rules specify the overall pitch contour for the utterance (declination or rising contour), fall-rise patterns, resetting phenomena and inherent fundamental frequency of vowels. Appropriate pauses between syntactic units are specified to enhance intelligibility and naturalness.

Keywords

Text-to-speech system prosodic features coarticulation intonation formants content word function word 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allen J 1985 A perspective of man-machine communication by speech.Proc. IEEE 73: 1541–1551Google Scholar
  2. Allen J, Hunnicutt M S, Klatt D H 1987From text-to-speech: the MITalk system (Cambridge: University Press)Google Scholar
  3. Atal B S, Hanauer S L 1971 Speech analysis and synthesis by linear prediction of speech wave.J. Acoust. Soc. Am. 50: 637–655CrossRefGoogle Scholar
  4. Childers D G, Ke Wu 1990 Quality of speech produced by analysis-synthesis.Speech Commun. 9: 97–117CrossRefGoogle Scholar
  5. Fant G 1982 The voice source — acoustic modeling, Technical report,STL-QPSR 4/1982: 28–48Google Scholar
  6. Klatt D H 1976 Linguistic uses of segmental duration in English: acoustic and perceptual evidences.J. Acoust. Soc. Am. 60: 1208–1221CrossRefGoogle Scholar
  7. Klatt D H 1980 Software for a cascade/parallel formant synthesizer.J. Acoust. Soc. Am. 67: 971–995CrossRefGoogle Scholar
  8. Madhukumar A S, Rajendran S, Yegnanarayana B 1993 Intonation component of a text-to-speech system for Hindi.Computer Speech and Language 7: 283–301CrossRefGoogle Scholar
  9. Makhoul J 1975 Linear prediction: a tutorial review.Proc. IEEE 63: 561–580CrossRefGoogle Scholar
  10. Markel J D 1972 TheSIFT algorithm for fundamental frequency estimation.IEEE Trans. Acoust. Speech Signal Process. 24: 399–418Google Scholar
  11. Ohman S E G 1966 Coarticulation inVCV utterances: spectrographic measurements.J. Acoust. Soc. Am. 39: 151–168CrossRefGoogle Scholar
  12. O’Shaughnessy D 1984 Design of a real-time French text-to-speech system.Speech Commun. 3: 233–243CrossRefGoogle Scholar
  13. O’Shaughnessy D 1987Speech communication — Human and machine (Reading,MA: Addison Wesley)Google Scholar
  14. Papamichalis P E 1987Practical approaches to speech coding (Englewood Cliffs,NJ: Prentice Hall)Google Scholar
  15. Pisoni D B, Nusbaum H C, Green B G 1985 Perception of synthetic speech generated by rule.Proc. IEEE 73: 1665–1676Google Scholar
  16. Rajesh Kumar S R 1990Significance of durational knowledge in a text-to-speech system for Hindi. M S Dissertation, Indian Institute of Technology, MadrasGoogle Scholar
  17. Ramachandran V R, Yegnanarayana B 1992 Coarticulation rules for a text-to-speech system for Hindi. InProceedings of the Speech Technology Workshop, Indian Institute of Technology, Madras, pp. 211–219Google Scholar
  18. Rich E 1983Artificial intelligence (New York: McGraw Hill)Google Scholar
  19. Rogers D F, Adams J A 1989Mathematical elements for computer graphics (New York: McGraw Hill)Google Scholar
  20. Yegnanarayana B, Murthy H A, Ramachandran V R 1991 Speech processing using modified group delay functions. InProceedings of the International Conference on Acoustics, Speech, and Signal Processing, Toronto, 2: 945–948Google Scholar
  21. Yegnanarayana B, Murthy H A, Sundar R, Alwar N, Ramachandran V R, Madhukumar A S, Rajendran S 1990 Development of a text-to-speech system for Indian languages. InFrontiers of knowledge based computing systems (eds) K M Rege, V P Bhatkar (Bombay: Narosa)Google Scholar

Copyright information

© the Indian Academy of Sciences 1994

Authors and Affiliations

  • B Yegnanarayana
    • 1
  • S Rajendran
    • 1
  • V R Ramachandran
    • 1
  • A S Madhukumar
    • 1
  1. 1.Department of Computer Science and EngineeringIndian Institute of TechnologyMadrasIndia

Personalised recommendations