Abstract
Vowel recognition is essential in Chinese speech recognition, especially in the speaker independent tasks. In this paper, the authors argued that the fixed length frame segmentation of the speech signal makes the feature extracting process lose essential features and introduces some irrelevant information so that the extracted features may be less expressive and consistent. Using the pitch-based dynamically adaptive frames will improve the process of extracting the speech features so that they can be more expressive for the phonemes to be recognized, and more consistent among different speakers. The algorithm for dynamically segmenting the speech signals is discussed, and a variety of features has been tested with the pitch-based adaptive frames, and a new type of feature, the FFT magnitude pattern, shows that it is very expressive and consistent and may help to simplify the recognition models. By the use of the FFT magnitude patterns, definite algorithms can be adopted at the recognition stage. This will simplify the calculation and speed up the process. The experiment is done using a finite-state machine model. The results showed that the pitch-based FFT magnitude patterns are more expressive and consistent than other features and suitable for speaker independent Chinese speech recognition tasks.
This work was supported by the Trans-Century Training Programme Foundation for the Talents by the State Education Commission, China.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
Xu Bo, et al.: Large Vocabulary Isolated-Word Chinese Speech recognition Based on HMM/VQ. Proc. of National Conf. on Man-Machine Sound Communication-94, pp 146–152, Oct, 1994, Chongqing, China
Baosheng Yuan, et al.: An Unlimited Vocabulary Speaker-Dependent Chinese Speech Recognition System. Proc. NCMMSC-94 ppl57–160, Oct, 1994, Chongqing, China
Ji Tianying, et al. Continuous Speech Recognition on Chinese Limited Commands. Proc. NCMMSC-94 pp273–276, Oct, 1994, Chongqing, China
Lin-shah Lee, et al. Golden Mandarin (1)—A Real Time Mandarin Speech Dictation Machine for Chinese Language with Very Large Vocabulary, IEEE Trans. Speech & Audio Processing, Vol. 1 No 2, April 1993
Yoav Medan, et al: Super Resolution Pitch Detennination of Speech Signals, IEEE Trans. on Signal Processing. Vol.39, No. 1. pp40–48, Jan. 1991.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baolin, Y., Tao, Y. (1997). Vowel recognition for speaker independent Chinese speech recognition. In: Sattar, A. (eds) Advanced Topics in Artificial Intelligence. AI 1997. Lecture Notes in Computer Science, vol 1342. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63797-4_83
Download citation
DOI: https://doi.org/10.1007/3-540-63797-4_83
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63797-4
Online ISBN: 978-3-540-69649-0
eBook Packages: Springer Book Archive