Abstract
In speech synthesis, it is necessary to identify the graphemes in every word to be converted to speech. This chapter deals with this process normally referred to as text-to-phoneme or grapheme-to-phoneme conversion. Many rules for such conversion, known as phonology in linguistic parlance, have been proposed by the eminent linguists for this dialect, namely SCB. Unfortunately these rules are not in the computer implementable form. This chapter presents the development of a rule-based G2P (Grapheme-To-Phoneme) conversion system for SCB.
References
Ainsworth WA (1973) A system for converting English text into speech. IEEE Trans Audio Electroacoust 21:288–290
Allen J, Hunnicutt S, Carlson R, Granstrom B (1979) MITalk-79: The 1979 MITtext-to-speech system. In: Wolf JJ, Klatt DH (eds), ASA-50 Speech communication papers. Acoustical Society of America, New York, pp 507–510
Bakiri G (1991) Converting english text to speech: a machine learning approach. PhD thesis, Rep No 91-30-1. Department of Computer Science, Oregon State University
Barnard E, Cole RA, Vea MP, Alleva FA (1991) Pitch detection with a neural-net classifier. IEEE Trans Signal Process 39(2):298–307
Biswas S (2004) Samsad Bangla Dictionary (Samsada Bangala Abhidhana), 7th ed. Calcutta, Sahitya Samsad
Bernstein J, Nessly L (1981) Performance comparison of component algorithms for the phonemicization of orthography. In: Proceedings of 19th Annual Meeting of the Association for Computational Linguistics. Stanford, CA, pp 19–21
Chatterji SK (1926) The origin and development of the Bengali language. Calcutta University, Kolkata
Dedina MJ, Nusbaum HC (1991) Pronounce: a program for pronunciation by analogy. Comput Speech Lang 5:55–64
Elovitz HS, Johnson RW, McHugh A, Shore JE (1976) Automatic translation of English text to phonetics by means of letter-to-sound rules, NRL Report 7948. Naval Research Laboratory, Washington, D.C
Golding AR (1991) Pronouncing names by a combination of case-based and rule-based reasoning. PhD thesis, Stanford University
Hart J’t, Collier R, Cohen A (1990) A perceptual study of intonation, an experimental phonetic approach to speech melody. In: Cambridge studies in speech science and communication. Cambridge University Press, Cambridge
Hochberg J, Mniszewski SM, Calleja T, Papcun GJ (1991) A default hierarchy for pronouncing English. IEEE Trans Pattern Anal Mach Intell 13(9):957–964
Hunnicutt S (1980) Grapheme to phoneme rules: a review. QPSR 2-3, Speech Transmission Laboratory. Royal Institute of Technology, Stockholm, Sweden, pp 38–60
Klatt DH (1982) The KLATTalk text-to-speech conversion system. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp 1589–1592
Klatt DH, Shipman DW (1982) Letter-to-phoneme rules: a semi-automatic discovery procedure. J Acoust Soc Am 72(1, S48):737–793
Lucas SM, Damper RI (1992) Syntactic neural networks for bi-directional text-phonetics translation. In: G Bailly, C Benoit (eds), Proceedings of international conference on talking machines, theories, models and designs, North-Holland Publishers, pp 127–141
McCormick S, Hertz SR (1989) A new approach to English text-to-phoneme conversion using delta, Version 2, 117th Meeting. J Acoust Soc Am 85(Supplement 1):S124
Meng HM (1995) Phonological parsing for Bi-Directional letter-to-sound and sound-to-letter generation. Ph.D. Thesis, MIT, Cambridge, MA
O’Malley MH (1990) Text-to-speech conversion technology. IEEE Comput 23:17–23
Parfitt S, Sharman R (1991) A bi-directional model of English pronunciation. In: Proceedings of Eurospeech 91, vol 2. pp 801–804
Santen Jan PH Van, Sproat Richard W, Olive Joseph P, Hirschberg J (eds) (1997) Progress in speech synthesis. Springer-Verlag, New York Inc
Sarkar P (1992) Bangla banan sanskar: samasyao sambhabana (A Monograph on Bengali Spelling Reform). Chirayata Prakashan, Kolkata
Sarkar P (1993) Bangla bhasar yuktabyanjan (The Consonant Clusters in the Bangla Language). Bhasa 1:23–45
Scordilis M, Gowdy J (1989) Neural network based generation of fundamental frequency contours. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP89), vol 1. pp 219–222
Tagore R (1989) Bangla Sabdatattwa. Viswabharati, Kolkata
Vitale AJ (1991) An algorithm for high accuracy name pronunciation by parametric speech synthesizer. Comput Linguist 17(3):257–276
Author information
Authors and Affiliations
Appendix
Appendix
The following table is the rule table (example of which is shown in Table 4.3) that includes some of the compiled rules. In this table, to represent the vowel ligature “+” sign is used in between the consonant and the vowel. Similarly, the consonant cluster is represented by the “+” sign in between the two consonants. In the table when the position of the input string in the word is important, “*” is used to indicate its position. For the other cases, the position of the input string in a word is not important. It is also to be noted that the RDB table contains only the ASCII strings in the “Input String” and “Output String” columns. In the present case, the POS and semantic information are not used and hence they are not shown in this table. The hidden অ (A) is for any consonant C is the string “A” just after it.
Applied phonological rule/rules | Input string | Output string | Meaning of special symbols, if any, present in 2nd and 3rd columns |
---|---|---|---|
4.3.2 (1) | AI (অই) /ɔi/ | OI (ওই) /oi/ | – |
4.3.2 (1) | AU (অউ) /ɔu/ | OU (ওউ) /ou/ | – |
4.3.2 (1) 4.3.2 (6) | AK+SA (অক্ষ) /ɔkSɔ/ | OK+SO (ওক্ষ) /okSo/ | – |
4.3.2 (2) | CAN /Cɔn/ | CON /Con/ | – |
4.3.2 (2) | CAN0 /Cɔɳ/ | CON0 /Coɳ/ | |
4.3.2 (3) | CA /Cɔ/) | CO /Co/ | – |
4.3.2 (4) | CACA /CɔCɔ/ | CAC /CɔC/ | – |
4.3.2 (5) | *C+CA /C+Cɔ/ | *C+CO /C+Co/ | ‘*’ represents any one of ই (I), উ(U), ক্ষ (K+S1), or জ্ঞ (J+N1) |
4.3.2 (6) | C+CA /CCɔ/ | C+CO /CCo/ | – |
4.3.2 (8) | C+(R0+I)CA ( C+Ê ligature CA) /CriCɔ/ | C+(R0+I)CO ( C+Ê ligature CO) /CriCo/ | – |
4.3.2 (9) | CACA /CɔCɔ/ | CACO /CɔCo/ | – |
4.3.2 (10) | ACAC /ɔCɔC/ | ACOC (/CoC/) | – |
4.3.2 (10) | AACAC /ɐCɔC/ | AACOC /ɐCoC/ | – |
4.3.2 (10) | CACAC /CɔCɔC/ | CACOC /CɔCoC/ | – |
4.3.2 (10) | C+AACAC /CɐCɔC/ | C+AACoC /CɐCoC/ | – |
4.3.2 (11) | HAC+E /hɔle/ | HOC+E /hole/ | – |
4.3.2 (12) | CA*CH+E /Cɔ*tʃhe/ | CO*CH+E /Co*tʃhe/ | ‘*’ represents the middle part of the word |
4.3.2 (13) | C+RYA /Cryɔ/ | C+RAYA /Crɔyɔ/ | – |
4.3.2 (14) | C*YA /C*yɔ/ | C*YO /C*yo/ | ‘*’represents any ligature other than ɐ(AA) |
4.3.2 (15) | C+AAYA /Cayɔ/ | C+AAY /Cay/ | – |
4.3.2 (16) | YAC+AA /yɔCɐ/ | YC+AA /yCɐ/ | – |
4.3.2 (17) | NGCA /ŋCɔ/ | NGCO /ŋCo/ | – |
4.3.2 (18) | EKAC /ekɔC/ | EEKC /ækɔC/ | – |
4.3.2 (18) | EKAC+C /ekɔCC/ | EEKAC+C /ækɔCC/ | – |
4.3.3 (1) | EC* /eC*/ | EC* /eC*/ | ‘*’ represents vowel ই (I) or উ (U) |
4.3.3 (1) | EC* /eC*/ | EEC* /æC*/ | ‘*’ represents any vowel other than ই (I), উ(U) |
4.3.3 (1) | ECAC* /eCɔC*/ | ECAC* /eCɔC*/ | ‘*’ represents vowel ই (I) or উ(U) |
4.3.3 (1) | ECAC* /eCɔC*/‘*’ | EECAC* /æCɔC*/ | represents any vowel other than ই (I), উ (U) |
4.3.4 (1) | J+N1+V* /dzɳV*/ | G+V͂* /gV͂*/ | ‘*’ represents the rest of the word. V represents any vowel and represents the nasal counterpart of V |
4.3.4 (2) | *J+N1+V /*dzɳV/. | *GG+V͂ /*ggV͂/ | ‘*’ represents the previous part of the word. V represents any vowel and represents the nasal counterpart of V |
4.3.4 (3) | J+N1+AA* /dzɳa*/ | G+EE0* /gæ͂*/ | ‘*’ represents the rest of the word |
4.3.4 (4) | J+N1A* /dzɳɔ*/ | G+O0* /gõ*/ | ‘*’ represents the rest of the word |
4.3.4 (4) | * J+N1A /*dzɳɔ/ | *GG+O0 /*ggõ/ | ‘*’ represents the previous part of the word |
4.3.5 (1) | *C+Y+V /*CyV/ | *CC+Y+V /*CCyV/ | ‘*’ represents the previous part of the word. V represents any vowel |
4.3.5 (2) | *C+Y+AA /*Cyɐ/ | *CC+Y+EE /*CCyæ/ | ‘*’ represents the previous part of the word |
4.3.5 (2) | C+Y+AA* /Cya*/ | C+Y+EE /Cyæ*/ | ‘*’ represents the rest of the word |
4.3.5 (3) | *C+YA /*Cyɔ/ | *CC+YO /*CCyo/ | ‘*’ represents the previous part of the word |
4.3.5 (4) | *H+Y+V /*hyV/ | *JJH+V /*dzdzhV/ | ‘*’ represents the previous part of the word. V represents any vowel |
4.3.6 (1) | C+B* /Cb*/ | C* /C*/ | ‘*’ represents the rest of the word |
4.3.6 (2) | *C+B /*Cb/ | *CC /*CC/ | ‘*’represents the previous part of the word. Here C represents any vowel except H |
4.3.6 (3) | *VH+B /*Vhb/ | *VOBH /*Vobh/ | ‘*’ represents the previous part of the word. V represents vowel % (A) or %ç(AA) |
4.3.6 (3) | *IH+B /*ihb/ | *IUBH /*iubh/ | – |
4.3.6 (4) | *C+C+B /*CCb/ | *C+C /*CC/ | ‘*’ represents the previous part of the word |
4.3.7 (1) | C+M* /Cm*/ | C* /C*/ | ‘*’ represents the rest of the word |
4.3.7 (2) | *C+M+V /*CCV/ | *CC+V͂ /CCV͂/ | ‘*’ represents the previous part of the word. Here C is stop or sibilant. V is any vowel and is the nasal counterpart of it |
4.3.8 (1) | *C+R /*Cr/ | *CCR /*CCr/ | ‘*’ represents the previous part of the word |
4.2.9 (1) | CV /CV/ | CV͂ /CV͂/ | C is the consonant Ë (M) or XË(N). V is any vowel and is the nasal counterpart of it |
4.3.10 (1) | S /ʃ/ | SH /Ç/ | – |
4.3.10 (2) | S+C /ʃC/ | S1+C /sC/ | Here C is the consonant ট (T0) or ঠ(TH0) |
4.3.10 (3) | S+C /ʃC/ | S+C /ʃC/ | Here C is any one of the consonants ত (T), থ (TH), ন (N), প (P), ফ (PH), র (R), _ল (L), ক (K), খ (KH) |
4.3.10 (3) | S+C /ʃC/ | SH+C /ÇC/ | Here C is any consonant other than ত (T), থ (TH), ন (N), প (P), ফ (PH), র(R), ল (L), ক (K), খ (KH) |
4.3.11 (1) | V+ | V͂/V͂/ | V is any vowel and is the nasal counterpart of it |
4.3.11 (2) | C++AA | C V͂/C V͂/ | V is any vowel and is the nasal counterpart of |
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd
About this chapter
Cite this chapter
Datta, A.K. (2018). Phonological Rules for TTS. In: Epoch Synchronous Overlap Add (ESOLA). Signals and Communication Technology. Springer, Singapore. https://doi.org/10.1007/978-981-10-7016-7_4
Download citation
DOI: https://doi.org/10.1007/978-981-10-7016-7_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7015-0
Online ISBN: 978-981-10-7016-7
eBook Packages: EngineeringEngineering (R0)