A Hybrid Approach to Pattern Matching for Text-to-Speech Conversion

  • Chew Lim Tan
  • Yan Rong Chen
  • Paul Hong Jyh Wu
Conference paper

Abstract

Assignment of phonetic symbols to characters in a text-to-speech conversion system is a pattern analysis and recognition process. This research proposes a hybrid approach to pattern matching for speech synthesis by machine. The problem statement may be reduced as follows: How do we assign a phoneme to a character given its contextual information, i.e. the characters preceding and following the character? In our present study, we use a contextual window of five characters wide allowing up to two characters on either side of each character in question for phoneme assignment. The assignment method is based on a machine learning approach by training the system with a large set of examples. The hybrid approach is to integrate an information gain learning algorithm with a transformation-based error driven learning algorithm. The examples for training and testing in the present work are taken from NETtalk Corpus, containing a list of 20,008 English words along with a phonetic transcription for each word. This hybrid approach has been shown to achieve a final accuracy of 96.86%.

Keywords

Entropy Santen 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Daelemans W. GRAFON-D: A Grapheme-to-phoneme Conversion System for Dutch. AI Memo 88–5, AI-LAB Brussels, 1988.Google Scholar
  2. [2]
    Daelemans W and van den Bosch A. Data-Oriented methods for Grapheme-to-Phoneme Conversion. In proceeding of the 6th Conference of the European Chapter of the ACL, Utrecht, April 1993, pp 45–53.Google Scholar
  3. [3]
    Daelemans W and van den Bosch A. Language-independent, data-oriented grapheme-to-phoneme conversion. In: van Santen JPH, Sproat RW, Olive JP and Hirschberg J (eds). Progress in Speech Synthesis, 1997, pp 77–89.CrossRefGoogle Scholar
  4. [4]
    Daelemans W and van den Bosch A. Generalisation performance of backpropagation learning on a syllabification task. In M. Drossaers and A. Nijholt (Eds.), Proceedings of the 3rd Twente Workshop on Language Technology. Enschede: Universiteit Twente, 1992, pp 27–37.Google Scholar
  5. [5]
    Quinlan JR. Induction of Decision Trees. Machine Learning 1, 81–106, 1986.Google Scholar
  6. [6]
    Brill EA. Corpus-Based Approach to Language Learning. Ph.D. Dissertation, Department of Computer and Information Science, University of Pennsylvania, 1993.Google Scholar
  7. [7]
    Brill EA. Some Advances in Transformation-Based Part of Speech Tagging In proceeding of the 12th National Conference on Artificial Intelligence(AAAI-94), 1994.Google Scholar
  8. [8]
    Sejnowski TJ and Rosenberg CR. NETtalk: A parallel network that learns to read aloud. Technical Report JHU/EECS-86/1, John Hopkins University Department of Electrical Engineering and Computer Science, 1986.Google Scholar
  9. [9]
    Sejnowski TJ and Rosenberg CR. Parallel networks that learn to pronounce English text. Complex Systems, 1987; 1:145–168.MATHGoogle Scholar
  10. [10]
    Daelemans W and van den Bosch A. TABTALK: reusability in data-oriented grapheme-to-phoneme conversion. In Proceedings of Eurospeech 1993, Berlin, pp 1459–1466.Google Scholar

Copyright information

© Springer-Verlag London Limited 1999

Authors and Affiliations

  • Chew Lim Tan
    • 1
  • Yan Rong Chen
    • 1
  • Paul Hong Jyh Wu
    • 2
  1. 1.School of ComputingNational University of SingaporeKent RidgeSingapore
  2. 2.Kent Ridge Digital LabsKent RidgeSingapore

Personalised recommendations