Abstract
This paper describes significant problems for automatic mapping of acoustic features into phonemic labels of the phonemic(phonetic) block in an automatic speech recognition. These problems are feature parameter, segmentation, labeling, co-articulation and speaker differences. We also discuss some general approaches for language independent problems, especially, pattern matching techniques for labeling, and describe our approach method in the LITHAN speech understanding system. Since these depend on each other, lastly, we emphasize that a system should have adaptive functions for various factors which bring out the varieties of speech.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
D.R.Broad and J.E.Shoup: Concepts for acoustic phonetic recognition, in Speech Recognition, ed. R.Reddy, pp.243–274, Academic Press (1975).
S.Nakagawa: A machine understanding system for spoken Japanese sentences, Ph.D thesis, Kyoto University (1976).
T.Sakai and S.Nakagawa: A speech understanding system of simple Japanese sentences in a task domain, IECEJ Trans. Vol-60E, No.l, pp.13–20(1977).
T.Sakai ans S.Nakagawa: Speech understanding system — LITHAN -and some applications, Proceedings of the 3rd IJCPR, pp.621–625 (1976).
S.Nakagawa and T.Sakai: A word recognition method from a classified phoneme string in the LITHAN speech understanding system, Conference Record of ICASSP, pp.726–730(1978).
S.Nakagawa and T.Sakai: On parsing direction and tree search in the LITHAN speech understanding system, ASA and ASJ Joint Meeting, JASA, Vol.64S, No.l (1978).
JJ.Wolf: Efficient acoustic parameters for speaker recognition, JASA, Vol.51, No.6, pp.2044–2056(1972).
B.S.Atal: Effectiveness of linear predictive characteristics of the speech wave for automatic speaker identification and verification, JASA, Vol.55, No.6, pp.1304–1312 (1974).
A.E.Rosenberg and M.R.Sambur: New technique for automatic speaker verification, IEEE Trans. Vol.ASSP-23, No.2, pp.169–176 (1975).
M.R. Sambur: Selection of Acoustic Features for speaker identification, IEEE Trans. Vol.ASSP-23, No.2, pp.176–182 (1975).
J.D.Markel, B.T.Oshika and A.H.Gray: Long-term averaging for speaker recognition, IEEE Trans. Vol.ASSP-25, No.4, pp.330–337
E.Bung et al.: Statistical techniques for automatic speaker recognition, Conference Record of ICASSP, pp.772–775(1977).
R.S.Cheung and B.A.Eisenstein: Feature selection via dynamic programming for text-independent speaker identification, IEEE Trans. Vol.ASSP-26, No.5, pp.397–403(1978).
Y.Grenier: Speaker identification from linear prediction, Proceedings of the 4-th IJCPR, pp.1019–1021(1978).
W.Klein, R.Plomp and L.C.W.Pols: Vowel spectra, vowel spaces, and vowel identification, JASA, Vol.48, No.4, pp.999–1009(1970).
L.C.W.Pols, H.R.C.Tromp and R.Plomp: Frequency analysis of Dutch vowels from 50 male speakers, JASA, Vol.53, No.4, pp.1093–1101(1973).
H.G.Goldberg: Segmentation and labeling of speech: a comparative performance evaluation, Ph.D thesis, Carnegie-Mellon University (1975).
P.F.Castelaz and R.J.Niederjohn: A comparison of linear prediction, FFT, zero-crossing analysis techniques for vowel recognition, Conference record of ICASSP, pp.541–545(1978).
A.Ichikawa, Y.Nakano and K.Nakata: Evaluation of various parameter sets in spoken digits recognition, IEEE Trans. Vol.AU-21, No.3, pp.202–209(1973).
G.M.White and R.B.Neely: Speech recognition experiments with linear prediction, bandpass filtering and dynamic programming, IEEE Trans, Vol.ASSP-24, No.2, pp.183–188(1976).
H.A.Barger and K.R.Rao: A comparison study of phonemic recognition by discrete orthogonal transforms, Conference Record of ICASSP, pp.553–556(1978).
S.Chiba, M.Watari and T.Watanabe: A speaker-independent word recognition system, Proceedings of the 4-th IJCPR, pp.995–999 (1978).
H.Kasuya and H.Wakita: Speech segmentation and feature normalization based on area functions, Conference Record of ICASSP, pp.29–32(1976).
T.Nakajima et al.: Estimation of vocal tract area function by adaptive reconvolution and adaptive speech analysis system, ASJ Trans. Vol.31, No.3, pp.157–166(1978, in Japanese).
K.Shirai and H.Honda: Feature extraction for speech recognition based on articulatory model, Proceedings of the 4-th IJCPR, pp.1064–1068 (1978).
P.Mermelstein: Automatic segmentation of speech into syllabic units, JASA, Vol.53, No.4, pp.880–883(1975).
R.Nakatsu and M.Kohda: Speech recognition of connected words, Proceedings of the 4-th IJCPR, pp.1009–1011(1978).
H.Kasuya and H.Wakita: On segmentation of continuous speech, Technical report on speech of ASJ, S78–10 (1978, in Japanese).
L.R.Rabiner and M.R.Sambur: An algorithm for determing the endpoints of isolated utterances, Bell Sys. Tech. J. Vol.54, pp.297–315(1975).
L.R.Rabiner, et al.: A comparative performance study of several pitch detection algorithms, IEEE Trans. Vol.ASSP-24, No.5, pp.399–418(1976).
B.S.Atal and L.R.Rabiner: A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition, IEEE Trans. Vol.ASSP-24, No.3, pp.201–212 (1976).
S.Nakagawa and T.Sakai: Some properties of Japanese sounds through perceptual experiments and spectral analyses, Studia Phonologica XI, pp.48–64(1977).
W.A.Lea, M.F.Madress and TE. Skinner: A prosodical-guided speech understanding strategy, IEEE symposium on speech recognition, pp. 38–44 (1974).
P.Mermelstein: The syntax of acoustic segments, Conference Record of ICASSP, pp.29–32(1976).
R.Demori, P.Laface and E.Piccolo: Automatic detection and description of syllabic features in continuous speech, IEEE Trans. Vol.ASSP-24, No.5, pp.365–379(1976).
K.W.Otten: Approaches to the machine recognition of conversational speech, in Advances in Computers, ed. M. Yovits, pp.127–163, Academic Press(1971).
T.Nakajima and T.Suzuki: Application of the articulatory feature vowel system to continuous speech, Record of Joint Meeting of ASJ, 2–2–5, Oct. 1978(in Japanese).
P.Mermelstein: On detecting nasals in continuous speech, JASA, Vol.61, No.2, pp.581–587(1977).
N.R.Dixon and H.F.Silverman: A general language-operated direction implementation system (GLODIS): its application to continuous speech recognition, IEEE Trans. Vol.ASSP-24, No.2, pp.137–162(1976).
F.Jelinek: Continuous speech recognition by statistical methods, Proceedings of the IEEE, Vol.64, No.4, pp.532–556 (1976).
D.R.Reddy: Computer recognition of connected speech, JASA, Vol.42, pp.329–347(1967).
C.J.Weinstein et al.: A system for acoustic-phonetic analysis of continuous speech, IEEE Trans, Vol.ASSP-23, No.l, pp.54–67 (1975)
K.Shikano and M.Kohda: On the LPC distance measures for vowel recognition in continuous utterances, Technical report on speech of ASJ, S78–19(1978, in Japanese).
F.Itakura: Minimum prediction residual principle applied to speech recognition, IEEE Trans. Vol.ASSP-23, No.l, pp.67–72 (1975).
M.Kohda, S.Hashimoto and S.Saito: Spoken digit mechanical recognition system, IECEJ Trans. Vol.55-D, No.3, pp.186–193 (1972, in Japanese).
A.H.Gray and J.D.Markel: Distance measures for speech processing, IEEE Trans. Vol.ASSP-24, No.5, pp.380–391(1976).
H.F.Siverman and N.R.Dixon: A comparison of several speech-spectra classification methods, IEEE Trans, No.4, pp.289–298(1976).
T.Nakajima and T.Suzuki:Study on variation of vowel tract shapes in continuous speech and vowel discrimination experiment based on articulatory feature extraction, Technical report on speech of ASJ, S77–42 (1977, in Japanese).
Y.Niimi: A method for forming universal reference patterns in an isolated word recognition system, Proceedings of the 4-th IJCPR, pp.1022–1032(1978).
K.Tanaka: A standard category pattern making method with application to phoneme recognition, Proceedings of the 4-th IJCPR, pp.1030–1032(1978).
S.Nakagawa and T.Sakai: A real time spoken word recognition system in a large vocabulary with learning capability of speaker differences, Proceedings of the 4-th IJCPR, pp.985–989 (1978).
V.M.Velichko and N.G.Zagoruiko: Automatic recognition of 200 words, Int.J.Man-Machine Studies, Vol.2, pp.223–234 (1970).
H.Sakoe and S.Chiba: A dynamic programming approach to continuous speech recognition, Report. 7-th ICA, 20-c-13(1971).
H.Sakoe and S.Chiba:Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. Vol.ASSP-26, No.l, pp.43–49(1978).
K.Tanaka: A dynamic processing approach to extraction and categorization of phonemic information, Conference Record of ICASSP, pp.5–8(1976).
H.Matsumoto and H.Wakita: Vowel normalization by frequency warping, ASA and ASJ Joint Meeting, JASA, Vol.64S, No.l (1978).
R.W.Christiansen and C.K.Rushforth: Detecting and locating key words in continuous speech using predictive coding, IEEE Trans. Vol.ASSP-25, No.5, pp.361–367(1977).
W.A.Woods: Motivation and overview of BBN SPEECHLIS: an experimented prototype for speech understanding research, IEEE Trans. Vol.ASSP-23, No.l, pp.2–10 (1975).
K.Shikano and M.Kohda: An estimation system of phoneme recognition rate of phoneme lattice, Record of Joint Meeting of ASJ, 3–1–17, Oct. 1977 (in Japanese).
H.Mizukami: Influence of phoneme recognition ability on word recognition rate, Graduation thesis, Dept. of Inform. Science, Kyoto University (1979, in Japanese).
Y.Takeuchi: Perceptual study of segmented Japanese monosyllables, Studia Phonologica I, pp.70–85(1961, in Japanese).
S.E.G.Öhman: Perception of segment of VCCV utterances, JASA, Vol.40, No.5, pp.979–988(1966).
W.A.Grimm: Perception of segments of English spoken consonant vowel syllables, JASA, Vol.40, No.5, pp.1454–1461(1966).
H.Kuwahara and H.Sakai: Perception of vowels and C-V syllables segmented from connected speech, ASJ Trans. Vol.28, No.5, pp.225–234(1972, in Japanese).
T.Gray: Articulatory movements in VCV sequences, JASA, Vol.62, No.1, pp.183–193(1977).
S.Kiritani and H.Hirose: Correlation analysis of the temporal patterns of articulatory movement and EMG, ASA and ASJ Joint Meeting, JASA, Vol.64S, No.1(1978).
S.Sekimoto and S. Kiritani: Parameter description of tongue point movements in the production of Japanese vowels, ASA and ASJ Joint Meeting, JASA, Vol.64S, No.1(1978).
K.N.Stevens and A.S.House: Perturbation of vowel articulations by consonantal context: an acoustical study, J. Speech Hearing Res. Vol.6, pp.111–128(1963).
S.E.G.Orman: Coarticulation in VCV utterances: spectrographic measurements, JASA, Vol.39, No.l, pp.151–168 (1966).
K.N.Stevens, A.S.House and A.P.Poul: Acoustical description of syllabic nuclei: an interpretation in terms of a dynamic model of articulation, JASA, Vol.40, No.l, pp.123–132(1966).
K.M.N.Menon, P.J.Jensen and D.Dew: Acoustic properties of certain VCC utterances, JASA, Vol.46, No.2, pp.449–457(1970).
D.J.Broad and R.H.Fertig: Formant-frequency trajectories in selected CVC-syllable nuclei, JASA, Vol.47, No.6, pp.1572–1582(1970).
K.Tabata and T.Sakai: Evaluation of the Speaker-factor in Japanese VCV utterances, IECEJ Trans. Vol.60E, No.6, pp.284–289(1977).
H.Kasuya, H.Suzuki and K.Kido: On properties of formant frequencies of vowels in meaningless words composed of three mores, Technical report on Electric Acoustics of IECEJ, EA68–13 (1968, in Japanese).
H.Kuwahara and H.Sakai: Normalization of coarticulation effect for a sequence of vowels in connected speech, ASJ Trans. Vol.29, No.2, pp.91–99(1973, in Japanese).
Y.Saito and H.Fujisaki: Formulation of the process of coarticulation in terms of formant frequencies and its application to automatic speech recognition, ASJ Trans. Vol.34, No.3, pp.177–185(1978, in Japanese).
S.Itahashi and S.Yokoyama: Formant trajectory tracking and its approximation by second order linear system, Record of Joint Meeting of ASJ, 2–1–11, May, 1973(in Japanese).
K.Tabata, A.Kamei and Y.Ohno: Hearing evaluation of speaker factor in vowel utterances, Record of Joint Meeting of ASJ, 1–5–11, Apr.1977 (in Japanese)
K.Ito and S.Saito: Analysis of talker information of speech wave, Record of Joint Meeting of ASJ, 2–1–3, Oct. 1977 (in Japanese).
H.Shirakata: Changes in feature parameters of Japanese vowels by age and sex of speakers, and recognition of vowels, Master thesis, Dept. of Inform. Science, Kyoto University(1979, in Japanese).
F.Nakatsu and M.Kohda: On the performance of the acoustic processor in the on-line conversational speech recognition system, Record of Joint Meeting of ASJ, 4–2–7, Apr. 1977 (in Japanese).
S.Saito and S.Furui: Personal information in dynamic characteristics of speech spectra, Proceedings of the 4-th IJCPR, pp.1014–1018(1978).
H.Matsumoto and T.Nimura: Text-independent speaker identification using canonical discriminant analysis, the effect of speaker-factor, phoneme x speaker factor, and temporal variation factor, Technical report on Electronics and Acoustics of IECEJ, EA77–33(1977, in Japanese).
M.Kohda and S.Saito: Influence of long-term variations of learning and unknown samples on recognition rate of spoken digits, Record of Joint Meeting of ASJ, 1–3–23, Oct. 1973 (in Japanese).
L.J.Gerstman: Classification of self-normalized vowels, IEEE Trans. Vol.AU-16, pp.78–80 (1968).
H.Fujisaiki, N.Nakamura and K.Yoshimoto: Normalization and recognition of sustained Japanese vowels, ASJ Trans. Vol.26, No.3, pp.152–153 (1970).
H.Wakita: Normalization of vowels by vocal-tract length and its application to vowel identification, IEEE Trans. Vol. ASSP-25, No.2, pp.183–192 (1977).
G. Fant: Speech sounds and features, M.I.T. Press (1973).
M.R.Sambur and L.R.Rabiner: A speaker-independent digit recognition system, BELL S.T.J., Vol.54, pp81–102 (1975).
S.Saito and M.Kohda: Spoken word recognition using the restricted number of learnig samples, Conference Record of ICASSP, pp.229–232 (1976).
S.Nakagawa and T.Sakai: Areal time spoken word recognition system with various learning capabilities of the speaker differences, IECEJ Trans. Vol.61-D, No.6. pp.395–402 (1978, in Japanese).
S.Furui: An efficient learning method for spoken word recognition, Technical report on speech of ASJ, S77–43 (1977, in Japanese).
B.T.Lowerre: Dynamic speaker adaption in the HARPY speech recognition system, Conference Record of ICASSP, pp.788–790 (1977).
T.Sakai: Adaptive system of pattern recognition, in Methodologies of Pattern Recognition, ed. S. Watanabe, pp.457–480, Academic Press, (1969).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1980 D. Reidel Publishing Company
About this paper
Cite this paper
Sakai, T. (1980). Automatic Mapping of Acoustic Features into Phonemic Labels. In: Simon, J.C. (eds) Spoken Language Generation and Understanding. NATO Advanced Study Institutes Series, vol 59. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-9091-3_8
Download citation
DOI: https://doi.org/10.1007/978-94-009-9091-3_8
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-009-9093-7
Online ISBN: 978-94-009-9091-3
eBook Packages: Springer Book Archive