Selection of Multiphone Synthesis Units and Grapheme-to-Phoneme Transcription Using Variable-Length Modeling of Strings
Language can be viewed as the result of a complex encoding process which maps a message into a stream of symbols: phonemes, graphemes, morphemes, words ...depending on the level of representation. At each level of representation, specific constraints like phonotactical, morphological or grammatical constraints apply, greatly reducing the possible combinations of symbols and introducing statistical dependencies between them. Numerous probabilistic models have been developed in the area of speech and language processing to capture these dependencies. In this chapter, we explore the potentiality of the multigram model to learn variable-length dependencies in strings of phonemes and in strings of graphemes. In the multigram approach described here, a string of symbols is viewed as a concatenation of independent variable-length subsequences of symbols. The ability of the multigram model to learn relevant subsequences of phonemes is illustrated by the selection of multiphone units for speech synthesis.
KeywordsSpeech Synthesis Speech Unit Pronunciation Dictionary Bigram Model Acoustic Unit
Unable to display preview. Download preview PDF.