A Hybrid Approach to Pattern Matching for Text-to-Speech Conversion

  • Chew Lim Tan
  • Yan Rong Chen
  • Paul Hong Jyh Wu
Conference paper


Assignment of phonetic symbols to characters in a text-to-speech conversion system is a pattern analysis and recognition process. This research proposes a hybrid approach to pattern matching for speech synthesis by machine. The problem statement may be reduced as follows: How do we assign a phoneme to a character given its contextual information, i.e. the characters preceding and following the character? In our present study, we use a contextual window of five characters wide allowing up to two characters on either side of each character in question for phoneme assignment. The assignment method is based on a machine learning approach by training the system with a large set of examples. The hybrid approach is to integrate an information gain learning algorithm with a transformation-based error driven learning algorithm. The examples for training and testing in the present work are taken from NETtalk Corpus, containing a list of 20,008 English words along with a phonetic transcription for each word. This hybrid approach has been shown to achieve a final accuracy of 96.86%.


Hybrid Approach Information Gain Testing Pattern Transformation Rule Speech Synthesis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Daelemans W. GRAFON-D: A Grapheme-to-phoneme Conversion System for Dutch. AI Memo 88–5, AI-LAB Brussels, 1988.Google Scholar
  2. [2]
    Daelemans W and van den Bosch A. Data-Oriented methods for Grapheme-to-Phoneme Conversion. In proceeding of the 6th Conference of the European Chapter of the ACL, Utrecht, April 1993, pp 45–53.Google Scholar
  3. [3]
    Daelemans W and van den Bosch A. Language-independent, data-oriented grapheme-to-phoneme conversion. In: van Santen JPH, Sproat RW, Olive JP and Hirschberg J (eds). Progress in Speech Synthesis, 1997, pp 77–89.CrossRefGoogle Scholar
  4. [4]
    Daelemans W and van den Bosch A. Generalisation performance of backpropagation learning on a syllabification task. In M. Drossaers and A. Nijholt (Eds.), Proceedings of the 3rd Twente Workshop on Language Technology. Enschede: Universiteit Twente, 1992, pp 27–37.Google Scholar
  5. [5]
    Quinlan JR. Induction of Decision Trees. Machine Learning 1, 81–106, 1986.Google Scholar
  6. [6]
    Brill EA. Corpus-Based Approach to Language Learning. Ph.D. Dissertation, Department of Computer and Information Science, University of Pennsylvania, 1993.Google Scholar
  7. [7]
    Brill EA. Some Advances in Transformation-Based Part of Speech Tagging In proceeding of the 12th National Conference on Artificial Intelligence(AAAI-94), 1994.Google Scholar
  8. [8]
    Sejnowski TJ and Rosenberg CR. NETtalk: A parallel network that learns to read aloud. Technical Report JHU/EECS-86/1, John Hopkins University Department of Electrical Engineering and Computer Science, 1986.Google Scholar
  9. [9]
    Sejnowski TJ and Rosenberg CR. Parallel networks that learn to pronounce English text. Complex Systems, 1987; 1:145–168.MATHGoogle Scholar
  10. [10]
    Daelemans W and van den Bosch A. TABTALK: reusability in data-oriented grapheme-to-phoneme conversion. In Proceedings of Eurospeech 1993, Berlin, pp 1459–1466.Google Scholar

Copyright information

© Springer-Verlag London Limited 1999

Authors and Affiliations

  • Chew Lim Tan
    • 1
  • Yan Rong Chen
    • 1
  • Paul Hong Jyh Wu
    • 2
  1. 1.School of ComputingNational University of SingaporeKent RidgeSingapore
  2. 2.Kent Ridge Digital LabsKent RidgeSingapore

Personalised recommendations