A Hybrid Approach to Pattern Matching for Text-to-Speech Conversion
Assignment of phonetic symbols to characters in a text-to-speech conversion system is a pattern analysis and recognition process. This research proposes a hybrid approach to pattern matching for speech synthesis by machine. The problem statement may be reduced as follows: How do we assign a phoneme to a character given its contextual information, i.e. the characters preceding and following the character? In our present study, we use a contextual window of five characters wide allowing up to two characters on either side of each character in question for phoneme assignment. The assignment method is based on a machine learning approach by training the system with a large set of examples. The hybrid approach is to integrate an information gain learning algorithm with a transformation-based error driven learning algorithm. The examples for training and testing in the present work are taken from NETtalk Corpus, containing a list of 20,008 English words along with a phonetic transcription for each word. This hybrid approach has been shown to achieve a final accuracy of 96.86%.
Unable to display preview. Download preview PDF.
- Daelemans W. GRAFON-D: A Grapheme-to-phoneme Conversion System for Dutch. AI Memo 88–5, AI-LAB Brussels, 1988.Google Scholar
- Daelemans W and van den Bosch A. Data-Oriented methods for Grapheme-to-Phoneme Conversion. In proceeding of the 6th Conference of the European Chapter of the ACL, Utrecht, April 1993, pp 45–53.Google Scholar
- Daelemans W and van den Bosch A. Generalisation performance of backpropagation learning on a syllabification task. In M. Drossaers and A. Nijholt (Eds.), Proceedings of the 3rd Twente Workshop on Language Technology. Enschede: Universiteit Twente, 1992, pp 27–37.Google Scholar
- Quinlan JR. Induction of Decision Trees. Machine Learning 1, 81–106, 1986.Google Scholar
- Brill EA. Corpus-Based Approach to Language Learning. Ph.D. Dissertation, Department of Computer and Information Science, University of Pennsylvania, 1993.Google Scholar
- Brill EA. Some Advances in Transformation-Based Part of Speech Tagging In proceeding of the 12th National Conference on Artificial Intelligence(AAAI-94), 1994.Google Scholar
- Sejnowski TJ and Rosenberg CR. NETtalk: A parallel network that learns to read aloud. Technical Report JHU/EECS-86/1, John Hopkins University Department of Electrical Engineering and Computer Science, 1986.Google Scholar
- Daelemans W and van den Bosch A. TABTALK: reusability in data-oriented grapheme-to-phoneme conversion. In Proceedings of Eurospeech 1993, Berlin, pp 1459–1466.Google Scholar