Phonological Rules for TTS

Datta, Asoke Kumar

doi:10.1007/978-981-10-7016-7_4

Asoke Kumar Datta²

Part of the book series: Signals and Communication Technology ((SCT))

289 Accesses

Abstract

In speech synthesis, it is necessary to identify the graphemes in every word to be converted to speech. This chapter deals with this process normally referred to as text-to-phoneme or grapheme-to-phoneme conversion. Many rules for such conversion, known as phonology in linguistic parlance, have been proposed by the eminent linguists for this dialect, namely SCB. Unfortunately these rules are not in the computer implementable form. This chapter presents the development of a rule-based G2P (Grapheme-To-Phoneme) conversion system for SCB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Ainsworth WA (1973) A system for converting English text into speech. IEEE Trans Audio Electroacoust 21:288–290
Google Scholar
Allen J, Hunnicutt S, Carlson R, Granstrom B (1979) MITalk-79: The 1979 MITtext-to-speech system. In: Wolf JJ, Klatt DH (eds), ASA-50 Speech communication papers. Acoustical Society of America, New York, pp 507–510
Google Scholar
Bakiri G (1991) Converting english text to speech: a machine learning approach. PhD thesis, Rep No 91-30-1. Department of Computer Science, Oregon State University
Google Scholar
Barnard E, Cole RA, Vea MP, Alleva FA (1991) Pitch detection with a neural-net classifier. IEEE Trans Signal Process 39(2):298–307
Google Scholar
Biswas S (2004) Samsad Bangla Dictionary (Samsada Bangala Abhidhana), 7th ed. Calcutta, Sahitya Samsad
Google Scholar
Bernstein J, Nessly L (1981) Performance comparison of component algorithms for the phonemicization of orthography. In: Proceedings of 19th Annual Meeting of the Association for Computational Linguistics. Stanford, CA, pp 19–21
Google Scholar
Chatterji SK (1926) The origin and development of the Bengali language. Calcutta University, Kolkata
Google Scholar
Dedina MJ, Nusbaum HC (1991) Pronounce: a program for pronunciation by analogy. Comput Speech Lang 5:55–64
Article Google Scholar
Elovitz HS, Johnson RW, McHugh A, Shore JE (1976) Automatic translation of English text to phonetics by means of letter-to-sound rules, NRL Report 7948. Naval Research Laboratory, Washington, D.C
Google Scholar
Golding AR (1991) Pronouncing names by a combination of case-based and rule-based reasoning. PhD thesis, Stanford University
Google Scholar
Hart J’t, Collier R, Cohen A (1990) A perceptual study of intonation, an experimental phonetic approach to speech melody. In: Cambridge studies in speech science and communication. Cambridge University Press, Cambridge
Google Scholar
Hochberg J, Mniszewski SM, Calleja T, Papcun GJ (1991) A default hierarchy for pronouncing English. IEEE Trans Pattern Anal Mach Intell 13(9):957–964
Google Scholar
Hunnicutt S (1980) Grapheme to phoneme rules: a review. QPSR 2-3, Speech Transmission Laboratory. Royal Institute of Technology, Stockholm, Sweden, pp 38–60
Google Scholar
Klatt DH (1982) The KLATTalk text-to-speech conversion system. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp 1589–1592
Google Scholar
Klatt DH, Shipman DW (1982) Letter-to-phoneme rules: a semi-automatic discovery procedure. J Acoust Soc Am 72(1, S48):737–793
Google Scholar
Lucas SM, Damper RI (1992) Syntactic neural networks for bi-directional text-phonetics translation. In: G Bailly, C Benoit (eds), Proceedings of international conference on talking machines, theories, models and designs, North-Holland Publishers, pp 127–141
Google Scholar
McCormick S, Hertz SR (1989) A new approach to English text-to-phoneme conversion using delta, Version 2, 117th Meeting. J Acoust Soc Am 85(Supplement 1):S124
Google Scholar
Meng HM (1995) Phonological parsing for Bi-Directional letter-to-sound and sound-to-letter generation. Ph.D. Thesis, MIT, Cambridge, MA
Google Scholar
O’Malley MH (1990) Text-to-speech conversion technology. IEEE Comput 23:17–23
Article Google Scholar
Parfitt S, Sharman R (1991) A bi-directional model of English pronunciation. In: Proceedings of Eurospeech 91, vol 2. pp 801–804
Google Scholar
Santen Jan PH Van, Sproat Richard W, Olive Joseph P, Hirschberg J (eds) (1997) Progress in speech synthesis. Springer-Verlag, New York Inc
Google Scholar
Sarkar P (1992) Bangla banan sanskar: samasyao sambhabana (A Monograph on Bengali Spelling Reform). Chirayata Prakashan, Kolkata
Google Scholar
Sarkar P (1993) Bangla bhasar yuktabyanjan (The Consonant Clusters in the Bangla Language). Bhasa 1:23–45
Google Scholar
Scordilis M, Gowdy J (1989) Neural network based generation of fundamental frequency contours. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP89), vol 1. pp 219–222
Google Scholar
Tagore R (1989) Bangla Sabdatattwa. Viswabharati, Kolkata
Google Scholar
Vitale AJ (1991) An algorithm for high accuracy name pronunciation by parametric speech synthesizer. Comput Linguist 17(3):257–276
Google Scholar

Download references

Author information

Authors and Affiliations

Society for Natural Language Technology Research (SNLTR), Kolkata, West Bengal, India
Asoke Kumar Datta

Authors

Asoke Kumar Datta
View author publications
You can also search for this author in PubMed Google Scholar

Appendix

The following table is the rule table (example of which is shown in Table 4.3) that includes some of the compiled rules. In this table, to represent the vowel ligature “+” sign is used in between the consonant and the vowel. Similarly, the consonant cluster is represented by the “+” sign in between the two consonants. In the table when the position of the input string in the word is important, “*” is used to indicate its position. For the other cases, the position of the input string in a word is not important. It is also to be noted that the RDB table contains only the ASCII strings in the “Input String” and “Output String” columns. In the present case, the POS and semantic information are not used and hence they are not shown in this table. The hidden অ (A) is for any consonant C is the string “A” just after it.

Applied phonological rule/rules	Input string	Output string	Meaning of special symbols, if any, present in 2nd and 3rd columns
4.3.2 (1)	AI (অই) /ɔi/	OI (ওই) /oi/	–
4.3.2 (1)	AU (অউ) /ɔu/	OU (ওউ) /ou/	–
4.3.2 (1) 4.3.2 (6)	AK+SA (অক্ষ) /ɔkSɔ/	OK+SO (ওক্ষ) /okSo/	–
4.3.2 (2)	CAN /Cɔn/	CON /Con/	–
4.3.2 (2)	CAN0 /Cɔɳ/	CON0 /Coɳ/
4.3.2 (3)	CA /Cɔ/)	CO /Co/	–
4.3.2 (4)	CACA /CɔCɔ/	CAC /CɔC/	–
4.3.2 (5)	*C+CA /C+Cɔ/	*C+CO /C+Co/	‘*’ represents any one of ই (I), উ(U), ক্ষ (K+S1), or জ্ঞ (J+N1)
4.3.2 (6)	C+CA /CCɔ/	C+CO /CCo/	–
4.3.2 (8)	C+(R0+I)CA ( C+Ê ligature CA) /CriCɔ/	C+(R0+I)CO ( C+Ê ligature CO) /CriCo/	–
4.3.2 (9)	CACA /CɔCɔ/	CACO /CɔCo/	–
4.3.2 (10)	ACAC /ɔCɔC/	ACOC (/CoC/)	–
4.3.2 (10)	AACAC /ɐCɔC/	AACOC /ɐCoC/	–
4.3.2 (10)	CACAC /CɔCɔC/	CACOC /CɔCoC/	–
4.3.2 (10)	C+AACAC /CɐCɔC/	C+AACoC /CɐCoC/	–
4.3.2 (11)	HAC+E /hɔle/	HOC+E /hole/	–
4.3.2 (12)	CACH+E /Cɔtʃ^he/	COCH+E /Cotʃ^he/	‘*’ represents the middle part of the word
4.3.2 (13)	C+RYA /Cryɔ/	C+RAYA /Crɔyɔ/	–
4.3.2 (14)	CYA /Cyɔ/	CYO /Cyo/	‘*’represents any ligature other than ɐ(AA)
4.3.2 (15)	C+AAYA /Cayɔ/	C+AAY /Cay/	–
4.3.2 (16)	YAC+AA /yɔCɐ/	YC+AA /yCɐ/	–
4.3.2 (17)	NGCA /ŋCɔ/	NGCO /ŋCo/	–
4.3.2 (18)	EKAC /ekɔC/	EEKC /ækɔC/	–
4.3.2 (18)	EKAC+C /ekɔCC/	EEKAC+C /ækɔCC/	–
4.3.3 (1)	EC* /eC*/	EC* /eC*/	‘*’ represents vowel ই (I) or উ (U)
4.3.3 (1)	EC* /eC*/	EEC* /æC*/	‘*’ represents any vowel other than ই (I), উ(U)
4.3.3 (1)	ECAC* /eCɔC*/	ECAC* /eCɔC*/	‘*’ represents vowel ই (I) or উ(U)
4.3.3 (1)	ECAC* /eCɔC/‘’	EECAC* /æCɔC*/	represents any vowel other than ই (I), উ (U)
4.3.4 (1)	J+N1+V* /dzɳV*/	G+V͂* /gV͂*/	‘*’ represents the rest of the word. V represents any vowel and represents the nasal counterpart of V
4.3.4 (2)	J+N1+V /dzɳV/.	GG+V͂ /ggV͂/	‘*’ represents the previous part of the word. V represents any vowel and represents the nasal counterpart of V
4.3.4 (3)	J+N1+AA* /dzɳa*/	G+EE0* /gæ͂*/	‘*’ represents the rest of the word
4.3.4 (4)	J+N1A* /dzɳɔ*/	G+O0* /gõ*/	‘*’ represents the rest of the word
4.3.4 (4)	* J+N1A /*dzɳɔ/	GG+O0 /ggõ/	‘*’ represents the previous part of the word
4.3.5 (1)	C+Y+V /CyV/	CC+Y+V /CCyV/	‘*’ represents the previous part of the word. V represents any vowel
4.3.5 (2)	C+Y+AA /Cyɐ/	CC+Y+EE /CCyæ/	‘*’ represents the previous part of the word
4.3.5 (2)	C+Y+AA* /Cya*/	C+Y+EE /Cyæ*/	‘*’ represents the rest of the word
4.3.5 (3)	C+YA /Cyɔ/	CC+YO /CCyo/	‘*’ represents the previous part of the word
4.3.5 (4)	H+Y+V /hyV/	JJH+V /dzdz^hV/	‘*’ represents the previous part of the word. V represents any vowel
4.3.6 (1)	C+B* /Cb*/	C* /C*/	‘*’ represents the rest of the word
4.3.6 (2)	C+B /Cb/	CC /CC/	‘*’represents the previous part of the word. Here C represents any vowel except H
4.3.6 (3)	VH+B /Vhb/	VOBH /Vob^h/	‘*’ represents the previous part of the word. V represents vowel % (A) or %ç(AA)
4.3.6 (3)	IH+B /ihb/	IUBH /iub^h/	–
4.3.6 (4)	C+C+B /CCb/	C+C /CC/	‘*’ represents the previous part of the word
4.3.7 (1)	C+M* /Cm*/	C* /C*/	‘*’ represents the rest of the word
4.3.7 (2)	C+M+V /CCV/	*CC+V͂ /CCV͂/	‘*’ represents the previous part of the word. Here C is stop or sibilant. V is any vowel and is the nasal counterpart of it
4.3.8 (1)	C+R /Cr/	CCR /CCr/	‘*’ represents the previous part of the word
4.2.9 (1)	CV /CV/	*CV͂* /*CV͂*/	C is the consonant Ë (M) or XË(N). V is any vowel and is the nasal counterpart of it
4.3.10 (1)	S /ʃ/	SH /Ç/	–
4.3.10 (2)	S+C /ʃC/	S1+C /sC/	Here C is the consonant ট (T0) or ঠ(TH0)
4.3.10 (3)	S+C /ʃC/	S+C /ʃC/	Here C is any one of the consonants ত (T), থ (TH), ন (N), প (P), ফ (PH), র (R), _ল (L), ক (K), খ (KH)
4.3.10 (3)	S+C /ʃC/	SH+C /ÇC/	Here C is any consonant other than ত (T), থ (TH), ন (N), প (P), ফ (PH), র(R), ল (L), ক (K), খ (KH)
4.3.11 (1)	V+	V͂/V͂/	V is any vowel and is the nasal counterpart of it
4.3.11 (2)	C++AA	C V͂/C V͂/	V is any vowel and is the nasal counterpart of

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Datta, A.K. (2018). Phonological Rules for TTS. In: Epoch Synchronous Overlap Add (ESOLA). Signals and Communication Technology. Springer, Singapore. https://doi.org/10.1007/978-981-10-7016-7_4

Download citation

DOI: https://doi.org/10.1007/978-981-10-7016-7_4
Published: 30 December 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7015-0
Online ISBN: 978-981-10-7016-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Phonological Rules for TTS

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation