Skip to main content

Phonological Rules for TTS

  • Chapter
  • First Online:
Epoch Synchronous Overlap Add (ESOLA)

Part of the book series: Signals and Communication Technology ((SCT))

  • 289 Accesses

Abstract

In speech synthesis, it is necessary to identify the graphemes in every word to be converted to speech. This chapter deals with this process normally referred to as text-to-phoneme or grapheme-to-phoneme conversion. Many rules for such conversion, known as phonology in linguistic parlance, have been proposed by the eminent linguists for this dialect, namely SCB. Unfortunately these rules are not in the computer implementable form. This chapter presents the development of a rule-based G2P (Grapheme-To-Phoneme) conversion system for SCB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Ainsworth WA (1973) A system for converting English text into speech. IEEE Trans Audio Electroacoust 21:288–290

    Google Scholar 

  • Allen J, Hunnicutt S, Carlson R, Granstrom B (1979) MITalk-79: The 1979 MITtext-to-speech system. In: Wolf JJ, Klatt DH (eds), ASA-50 Speech communication papers. Acoustical Society of America, New York, pp 507–510

    Google Scholar 

  • Bakiri G (1991) Converting english text to speech: a machine learning approach. PhD thesis, Rep No 91-30-1. Department of Computer Science, Oregon State University

    Google Scholar 

  • Barnard E, Cole RA, Vea MP, Alleva FA (1991) Pitch detection with a neural-net classifier. IEEE Trans Signal Process 39(2):298–307

    Google Scholar 

  • Biswas S (2004) Samsad Bangla Dictionary (Samsada Bangala Abhidhana), 7th ed. Calcutta, Sahitya Samsad

    Google Scholar 

  • Bernstein J, Nessly L (1981) Performance comparison of component algorithms for the phonemicization of orthography. In: Proceedings of 19th Annual Meeting of the Association for Computational Linguistics. Stanford, CA, pp 19–21

    Google Scholar 

  • Chatterji SK (1926) The origin and development of the Bengali language. Calcutta University, Kolkata

    Google Scholar 

  • Dedina MJ, Nusbaum HC (1991) Pronounce: a program for pronunciation by analogy. Comput Speech Lang 5:55–64

    Article  Google Scholar 

  • Elovitz HS, Johnson RW, McHugh A, Shore JE (1976) Automatic translation of English text to phonetics by means of letter-to-sound rules, NRL Report 7948. Naval Research Laboratory, Washington, D.C

    Google Scholar 

  • Golding AR (1991) Pronouncing names by a combination of case-based and rule-based reasoning. PhD thesis, Stanford University

    Google Scholar 

  • Hart J’t, Collier R, Cohen A (1990) A perceptual study of intonation, an experimental phonetic approach to speech melody. In: Cambridge studies in speech science and communication. Cambridge University Press, Cambridge

    Google Scholar 

  • Hochberg J, Mniszewski SM, Calleja T, Papcun GJ (1991) A default hierarchy for pronouncing English. IEEE Trans Pattern Anal Mach Intell 13(9):957–964

    Google Scholar 

  • Hunnicutt S (1980) Grapheme to phoneme rules: a review. QPSR 2-3, Speech Transmission Laboratory. Royal Institute of Technology, Stockholm, Sweden, pp 38–60

    Google Scholar 

  • Klatt DH (1982) The KLATTalk text-to-speech conversion system. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp 1589–1592

    Google Scholar 

  • Klatt DH, Shipman DW (1982) Letter-to-phoneme rules: a semi-automatic discovery procedure. J Acoust Soc Am 72(1, S48):737–793

    Google Scholar 

  • Lucas SM, Damper RI (1992) Syntactic neural networks for bi-directional text-phonetics translation. In: G Bailly, C Benoit (eds), Proceedings of international conference on talking machines, theories, models and designs, North-Holland Publishers, pp 127–141

    Google Scholar 

  • McCormick S, Hertz SR (1989) A new approach to English text-to-phoneme conversion using delta, Version 2, 117th Meeting. J Acoust Soc Am 85(Supplement 1):S124

    Google Scholar 

  • Meng HM (1995) Phonological parsing for Bi-Directional letter-to-sound and sound-to-letter generation. Ph.D. Thesis, MIT, Cambridge, MA

    Google Scholar 

  • O’Malley MH (1990) Text-to-speech conversion technology. IEEE Comput 23:17–23

    Article  Google Scholar 

  • Parfitt S, Sharman R (1991) A bi-directional model of English pronunciation. In: Proceedings of Eurospeech 91, vol 2. pp 801–804

    Google Scholar 

  • Santen Jan PH Van, Sproat Richard W, Olive Joseph P, Hirschberg J (eds) (1997) Progress in speech synthesis. Springer-Verlag, New York Inc

    Google Scholar 

  • Sarkar P (1992) Bangla banan sanskar: samasyao sambhabana (A Monograph on Bengali Spelling Reform). Chirayata Prakashan, Kolkata

    Google Scholar 

  • Sarkar P (1993) Bangla bhasar yuktabyanjan (The Consonant Clusters in the Bangla Language). Bhasa 1:23–45

    Google Scholar 

  • Scordilis M, Gowdy J (1989) Neural network based generation of fundamental frequency contours. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP89), vol 1. pp 219–222

    Google Scholar 

  • Tagore R (1989) Bangla Sabdatattwa. Viswabharati, Kolkata

    Google Scholar 

  • Vitale AJ (1991) An algorithm for high accuracy name pronunciation by parametric speech synthesizer. Comput Linguist 17(3):257–276

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Appendix

Appendix

The following table is the rule table (example of which is shown in Table 4.3) that includes some of the compiled rules. In this table, to represent the vowel ligature “+” sign is used in between the consonant and the vowel. Similarly, the consonant cluster is represented by the “+” sign in between the two consonants. In the table when the position of the input string in the word is important, “*” is used to indicate its position. For the other cases, the position of the input string in a word is not important. It is also to be noted that the RDB table contains only the ASCII strings in the “Input String” and “Output String” columns. In the present case, the POS and semantic information are not used and hence they are not shown in this table. The hidden অ (A) is for any consonant C is the string “A” just after it.

Applied phonological rule/rules

Input string

Output string

Meaning of special symbols, if any, present in 2nd and 3rd columns

4.3.2 (1)

AI (অই) /ɔi/

OI (ওই) /oi/

4.3.2 (1)

AU (অউ) /ɔu/

OU (ওউ) /ou/

4.3.2 (1)

4.3.2 (6)

AK+SA (অক্ষ) /ɔkSɔ/

OK+SO (ওক্ষ) /okSo/

4.3.2 (2)

CAN /Cɔn/

CON /Con/

4.3.2 (2)

CAN0 /Cɔɳ/

CON0 /Coɳ/

 

4.3.2 (3)

CA /Cɔ/)

CO /Co/

4.3.2 (4)

CACA /CɔCɔ/

CAC /CɔC/

4.3.2 (5)

*C+CA /C+Cɔ/

*C+CO /C+Co/

‘*’ represents any one of ই (I), উ(U), ক্ষ (K+S1), or জ্ঞ (J+N1)

4.3.2 (6)

C+CA /CCɔ/

C+CO /CCo/

4.3.2 (8)

C+(R0+I)CA ( C+Ê ligature CA) /CriCɔ/

C+(R0+I)CO ( C+Ê ligature CO) /CriCo/

4.3.2 (9)

CACA /CɔCɔ/

CACO /CɔCo/

4.3.2 (10)

ACACCɔC/

ACOC (/CoC/)

4.3.2 (10)

AACACCɔC/

AACOCCoC/

4.3.2 (10)

CACAC /CɔCɔC/

CACOC /CɔCoC/

4.3.2 (10)

C+AACAC /CɐCɔC/

C+AACoC /CɐCoC/

4.3.2 (11)

HAC+E /hɔle/

HOC+E /hole/

4.3.2 (12)

CA*CH+E /Cɔ*tʃhe/

CO*CH+E /Co*tʃhe/

‘*’ represents the middle part of the word

4.3.2 (13)

C+RYA /Cryɔ/

C+RAYA /Crɔyɔ/

4.3.2 (14)

C*YA /C*yɔ/

C*YO /C*yo/

‘*’represents any ligature other than ɐ(AA)

4.3.2 (15)

C+AAYA /Cayɔ/

C+AAY /Cay/

4.3.2 (16)

YAC+AA /yɔCɐ/

YC+AA /yCɐ/

4.3.2 (17)

NGCA /ŋCɔ/

NGCO /ŋCo/

4.3.2 (18)

EKAC /ekɔC/

EEKC /ækɔC/

4.3.2 (18)

EKAC+C /ekɔCC/

EEKAC+C /ækɔCC/

4.3.3 (1)

EC* /eC*/

EC* /eC*/

‘*’ represents vowel ই (I) or উ (U)

4.3.3 (1)

EC* /eC*/

EEC* /æC*/

‘*’ represents any vowel other than ই (I), উ(U)

4.3.3 (1)

ECAC* /eCɔC*/

ECAC* /eCɔC*/

‘*’ represents vowel ই (I) or উ(U)

4.3.3 (1)

ECAC* /eCɔC*/‘*’

EECAC* /æCɔC*/

represents any vowel other than ই (I), উ (U)

4.3.4 (1)

J+N1+V* /dzɳV*/

G+V͂* /gV͂*/

‘*’ represents the rest of the word. V represents any vowel and represents the nasal counterpart of V

4.3.4 (2)

*J+N1+V /*dzɳV/.

*GG+V͂ /*ggV͂/

‘*’ represents the previous part of the word. V represents any vowel and represents the nasal counterpart of V

4.3.4 (3)

J+N1+AA* /dzɳa*/

G+EE0* /gæ͂*/

‘*’ represents the rest of the word

4.3.4 (4)

J+N1A* /dzɳɔ*/

G+O0* /gõ*/

‘*’ represents the rest of the word

4.3.4 (4)

* J+N1A /*dzɳɔ/

*GG+O0 /*ggõ/

‘*’ represents the previous part of the word

4.3.5 (1)

*C+Y+V /*CyV/

*CC+Y+V /*CCyV/

‘*’ represents the previous part of the word. V represents any vowel

4.3.5 (2)

*C+Y+AA /*Cyɐ/

*CC+Y+EE /*CCyæ/

‘*’ represents the previous part of the word

4.3.5 (2)

C+Y+AA* /Cya*/

C+Y+EE /Cyæ*/

‘*’ represents the rest of the word

4.3.5 (3)

*C+YA /*Cyɔ/

*CC+YO /*CCyo/

‘*’ represents the previous part of the word

4.3.5 (4)

*H+Y+V /*hyV/

*JJH+V /*dzdzhV/

‘*’ represents the previous part of the word. V represents any vowel

4.3.6 (1)

C+B* /Cb*/

C* /C*/

‘*’ represents the rest of the word

4.3.6 (2)

*C+B /*Cb/

*CC /*CC/

‘*’represents the previous part of the word. Here C represents any vowel except H

4.3.6 (3)

*VH+B /*Vhb/

*VOBH /*Vobh/

‘*’ represents the previous part of the word. V represents vowel % (A) or %ç(AA)

4.3.6 (3)

*IH+B /*ihb/

*IUBH /*iubh/

4.3.6 (4)

*C+C+B /*CCb/

*C+C /*CC/

‘*’ represents the previous part of the word

4.3.7 (1)

C+M* /Cm*/

C* /C*/

‘*’ represents the rest of the word

4.3.7 (2)

*C+M+V /*CCV/

*CC+V͂ /CCV͂/

‘*’ represents the previous part of the word. Here C is stop or sibilant. V is any vowel and is the nasal counterpart of it

4.3.8 (1)

*C+R /*Cr/

*CCR /*CCr/

‘*’ represents the previous part of the word

4.2.9 (1)

CV /CV/

CV͂ /CV͂/

C is the consonant Ë (M) or XË(N). V is any vowel and is the nasal counterpart of it

4.3.10 (1)

S /ʃ/

SH /Ç/

4.3.10 (2)

S+CC/

S1+C /sC/

Here C is the consonant ট (T0) or ঠ(TH0)

4.3.10 (3)

S+CC/

S+CC/

Here C is any one of the consonants ত (T), থ (TH), ন (N), প (P), ফ (PH), র (R), _ল (L), ক (K), খ (KH)

4.3.10 (3)

S+CC/

SH+CC/

Here C is any consonant other than ত (T), থ (TH), ন (N), প (P), ফ (PH), র(R), ল (L), ক (K), খ (KH)

4.3.11 (1)

V+

V͂/V͂/

V is any vowel and is the nasal counterpart of it

4.3.11 (2)

C++AA

C /C /

V is any vowel and is the nasal counterpart of

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Datta, A.K. (2018). Phonological Rules for TTS. In: Epoch Synchronous Overlap Add (ESOLA). Signals and Communication Technology. Springer, Singapore. https://doi.org/10.1007/978-981-10-7016-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-7016-7_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-7015-0

  • Online ISBN: 978-981-10-7016-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics