Automatic Mapping of Acoustic Features into Phonemic Labels

Sakai, Toshiyuki

doi:10.1007/978-94-009-9091-3_8

Toshiyuki Sakai²

Part of the book series: NATO Advanced Study Institutes Series ((ASIC,volume 59))

283 Accesses
1 Citations

Abstract

This paper describes significant problems for automatic mapping of acoustic features into phonemic labels of the phonemic(phonetic) block in an automatic speech recognition. These problems are feature parameter, segmentation, labeling, co-articulation and speaker differences. We also discuss some general approaches for language independent problems, especially, pattern matching techniques for labeling, and describe our approach method in the LITHAN speech understanding system. Since these depend on each other, lastly, we emphasize that a system should have adaptive functions for various factors which bring out the varieties of speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D.R.Broad and J.E.Shoup: Concepts for acoustic phonetic recognition, in Speech Recognition, ed. R.Reddy, pp.243–274, Academic Press (1975).
Google Scholar
S.Nakagawa: A machine understanding system for spoken Japanese sentences, Ph.D thesis, Kyoto University (1976).
Google Scholar
T.Sakai and S.Nakagawa: A speech understanding system of simple Japanese sentences in a task domain, IECEJ Trans. Vol-60E, No.l, pp.13–20(1977).
Google Scholar
T.Sakai ans S.Nakagawa: Speech understanding system — LITHAN -and some applications, Proceedings of the 3rd IJCPR, pp.621–625 (1976).
Google Scholar
S.Nakagawa and T.Sakai: A word recognition method from a classified phoneme string in the LITHAN speech understanding system, Conference Record of ICASSP, pp.726–730(1978).
Google Scholar
S.Nakagawa and T.Sakai: On parsing direction and tree search in the LITHAN speech understanding system, ASA and ASJ Joint Meeting, JASA, Vol.64S, No.l (1978).
Google Scholar
JJ.Wolf: Efficient acoustic parameters for speaker recognition, JASA, Vol.51, No.6, pp.2044–2056(1972).
Google Scholar
B.S.Atal: Effectiveness of linear predictive characteristics of the speech wave for automatic speaker identification and verification, JASA, Vol.55, No.6, pp.1304–1312 (1974).
Google Scholar
A.E.Rosenberg and M.R.Sambur: New technique for automatic speaker verification, IEEE Trans. Vol.ASSP-23, No.2, pp.169–176 (1975).
Google Scholar
M.R. Sambur: Selection of Acoustic Features for speaker identification, IEEE Trans. Vol.ASSP-23, No.2, pp.176–182 (1975).
Google Scholar
J.D.Markel, B.T.Oshika and A.H.Gray: Long-term averaging for speaker recognition, IEEE Trans. Vol.ASSP-25, No.4, pp.330–337
Google Scholar
E.Bung et al.: Statistical techniques for automatic speaker recognition, Conference Record of ICASSP, pp.772–775(1977).
Google Scholar
R.S.Cheung and B.A.Eisenstein: Feature selection via dynamic programming for text-independent speaker identification, IEEE Trans. Vol.ASSP-26, No.5, pp.397–403(1978).
Google Scholar
Y.Grenier: Speaker identification from linear prediction, Proceedings of the 4-th IJCPR, pp.1019–1021(1978).
Google Scholar
W.Klein, R.Plomp and L.C.W.Pols: Vowel spectra, vowel spaces, and vowel identification, JASA, Vol.48, No.4, pp.999–1009(1970).
Google Scholar
L.C.W.Pols, H.R.C.Tromp and R.Plomp: Frequency analysis of Dutch vowels from 50 male speakers, JASA, Vol.53, No.4, pp.1093–1101(1973).
Google Scholar
H.G.Goldberg: Segmentation and labeling of speech: a comparative performance evaluation, Ph.D thesis, Carnegie-Mellon University (1975).
Google Scholar
P.F.Castelaz and R.J.Niederjohn: A comparison of linear prediction, FFT, zero-crossing analysis techniques for vowel recognition, Conference record of ICASSP, pp.541–545(1978).
Google Scholar
A.Ichikawa, Y.Nakano and K.Nakata: Evaluation of various parameter sets in spoken digits recognition, IEEE Trans. Vol.AU-21, No.3, pp.202–209(1973).
Google Scholar
G.M.White and R.B.Neely: Speech recognition experiments with linear prediction, bandpass filtering and dynamic programming, IEEE Trans, Vol.ASSP-24, No.2, pp.183–188(1976).
Google Scholar
H.A.Barger and K.R.Rao: A comparison study of phonemic recognition by discrete orthogonal transforms, Conference Record of ICASSP, pp.553–556(1978).
Google Scholar
S.Chiba, M.Watari and T.Watanabe: A speaker-independent word recognition system, Proceedings of the 4-th IJCPR, pp.995–999 (1978).
Google Scholar
H.Kasuya and H.Wakita: Speech segmentation and feature normalization based on area functions, Conference Record of ICASSP, pp.29–32(1976).
Google Scholar
T.Nakajima et al.: Estimation of vocal tract area function by adaptive reconvolution and adaptive speech analysis system, ASJ Trans. Vol.31, No.3, pp.157–166(1978, in Japanese).
Google Scholar
K.Shirai and H.Honda: Feature extraction for speech recognition based on articulatory model, Proceedings of the 4-th IJCPR, pp.1064–1068 (1978).
Google Scholar
P.Mermelstein: Automatic segmentation of speech into syllabic units, JASA, Vol.53, No.4, pp.880–883(1975).
Google Scholar
R.Nakatsu and M.Kohda: Speech recognition of connected words, Proceedings of the 4-th IJCPR, pp.1009–1011(1978).
Google Scholar
H.Kasuya and H.Wakita: On segmentation of continuous speech, Technical report on speech of ASJ, S78–10 (1978, in Japanese).
Google Scholar
L.R.Rabiner and M.R.Sambur: An algorithm for determing the endpoints of isolated utterances, Bell Sys. Tech. J. Vol.54, pp.297–315(1975).
Google Scholar
L.R.Rabiner, et al.: A comparative performance study of several pitch detection algorithms, IEEE Trans. Vol.ASSP-24, No.5, pp.399–418(1976).
Google Scholar
B.S.Atal and L.R.Rabiner: A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition, IEEE Trans. Vol.ASSP-24, No.3, pp.201–212 (1976).
Google Scholar
S.Nakagawa and T.Sakai: Some properties of Japanese sounds through perceptual experiments and spectral analyses, Studia Phonologica XI, pp.48–64(1977).
Google Scholar
W.A.Lea, M.F.Madress and TE. Skinner: A prosodical-guided speech understanding strategy, IEEE symposium on speech recognition, pp. 38–44 (1974).
Google Scholar
P.Mermelstein: The syntax of acoustic segments, Conference Record of ICASSP, pp.29–32(1976).
Google Scholar
R.Demori, P.Laface and E.Piccolo: Automatic detection and description of syllabic features in continuous speech, IEEE Trans. Vol.ASSP-24, No.5, pp.365–379(1976).
Google Scholar
K.W.Otten: Approaches to the machine recognition of conversational speech, in Advances in Computers, ed. M. Yovits, pp.127–163, Academic Press(1971).
Google Scholar
T.Nakajima and T.Suzuki: Application of the articulatory feature vowel system to continuous speech, Record of Joint Meeting of ASJ, 2–2–5, Oct. 1978(in Japanese).
Google Scholar
P.Mermelstein: On detecting nasals in continuous speech, JASA, Vol.61, No.2, pp.581–587(1977).
Google Scholar
N.R.Dixon and H.F.Silverman: A general language-operated direction implementation system (GLODIS): its application to continuous speech recognition, IEEE Trans. Vol.ASSP-24, No.2, pp.137–162(1976).
Google Scholar
F.Jelinek: Continuous speech recognition by statistical methods, Proceedings of the IEEE, Vol.64, No.4, pp.532–556 (1976).
Article Google Scholar
D.R.Reddy: Computer recognition of connected speech, JASA, Vol.42, pp.329–347(1967).
Google Scholar
C.J.Weinstein et al.: A system for acoustic-phonetic analysis of continuous speech, IEEE Trans, Vol.ASSP-23, No.l, pp.54–67 (1975)
Google Scholar
K.Shikano and M.Kohda: On the LPC distance measures for vowel recognition in continuous utterances, Technical report on speech of ASJ, S78–19(1978, in Japanese).
Google Scholar
F.Itakura: Minimum prediction residual principle applied to speech recognition, IEEE Trans. Vol.ASSP-23, No.l, pp.67–72 (1975).
Google Scholar
M.Kohda, S.Hashimoto and S.Saito: Spoken digit mechanical recognition system, IECEJ Trans. Vol.55-D, No.3, pp.186–193 (1972, in Japanese).
Google Scholar
A.H.Gray and J.D.Markel: Distance measures for speech processing, IEEE Trans. Vol.ASSP-24, No.5, pp.380–391(1976).
Google Scholar
H.F.Siverman and N.R.Dixon: A comparison of several speech-spectra classification methods, IEEE Trans, No.4, pp.289–298(1976).
Google Scholar
T.Nakajima and T.Suzuki:Study on variation of vowel tract shapes in continuous speech and vowel discrimination experiment based on articulatory feature extraction, Technical report on speech of ASJ, S77–42 (1977, in Japanese).
Google Scholar
Y.Niimi: A method for forming universal reference patterns in an isolated word recognition system, Proceedings of the 4-th IJCPR, pp.1022–1032(1978).
Google Scholar
K.Tanaka: A standard category pattern making method with application to phoneme recognition, Proceedings of the 4-th IJCPR, pp.1030–1032(1978).
Google Scholar
S.Nakagawa and T.Sakai: A real time spoken word recognition system in a large vocabulary with learning capability of speaker differences, Proceedings of the 4-th IJCPR, pp.985–989 (1978).
Google Scholar
V.M.Velichko and N.G.Zagoruiko: Automatic recognition of 200 words, Int.J.Man-Machine Studies, Vol.2, pp.223–234 (1970).
Article Google Scholar
H.Sakoe and S.Chiba: A dynamic programming approach to continuous speech recognition, Report. 7-th ICA, 20-c-13(1971).
Google Scholar
H.Sakoe and S.Chiba:Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. Vol.ASSP-26, No.l, pp.43–49(1978).
Google Scholar
K.Tanaka: A dynamic processing approach to extraction and categorization of phonemic information, Conference Record of ICASSP, pp.5–8(1976).
Google Scholar
H.Matsumoto and H.Wakita: Vowel normalization by frequency warping, ASA and ASJ Joint Meeting, JASA, Vol.64S, No.l (1978).
Google Scholar
R.W.Christiansen and C.K.Rushforth: Detecting and locating key words in continuous speech using predictive coding, IEEE Trans. Vol.ASSP-25, No.5, pp.361–367(1977).
Google Scholar
W.A.Woods: Motivation and overview of BBN SPEECHLIS: an experimented prototype for speech understanding research, IEEE Trans. Vol.ASSP-23, No.l, pp.2–10 (1975).
Google Scholar
K.Shikano and M.Kohda: An estimation system of phoneme recognition rate of phoneme lattice, Record of Joint Meeting of ASJ, 3–1–17, Oct. 1977 (in Japanese).
Google Scholar
H.Mizukami: Influence of phoneme recognition ability on word recognition rate, Graduation thesis, Dept. of Inform. Science, Kyoto University (1979, in Japanese).
Google Scholar
Y.Takeuchi: Perceptual study of segmented Japanese monosyllables, Studia Phonologica I, pp.70–85(1961, in Japanese).
Google Scholar
S.E.G.Öhman: Perception of segment of VCCV utterances, JASA, Vol.40, No.5, pp.979–988(1966).
Google Scholar
W.A.Grimm: Perception of segments of English spoken consonant vowel syllables, JASA, Vol.40, No.5, pp.1454–1461(1966).
Google Scholar
H.Kuwahara and H.Sakai: Perception of vowels and C-V syllables segmented from connected speech, ASJ Trans. Vol.28, No.5, pp.225–234(1972, in Japanese).
Google Scholar
T.Gray: Articulatory movements in VCV sequences, JASA, Vol.62, No.1, pp.183–193(1977).
Google Scholar
S.Kiritani and H.Hirose: Correlation analysis of the temporal patterns of articulatory movement and EMG, ASA and ASJ Joint Meeting, JASA, Vol.64S, No.1(1978).
Google Scholar
S.Sekimoto and S. Kiritani: Parameter description of tongue point movements in the production of Japanese vowels, ASA and ASJ Joint Meeting, JASA, Vol.64S, No.1(1978).
Google Scholar
K.N.Stevens and A.S.House: Perturbation of vowel articulations by consonantal context: an acoustical study, J. Speech Hearing Res. Vol.6, pp.111–128(1963).
Google Scholar
S.E.G.Orman: Coarticulation in VCV utterances: spectrographic measurements, JASA, Vol.39, No.l, pp.151–168 (1966).
Google Scholar
K.N.Stevens, A.S.House and A.P.Poul: Acoustical description of syllabic nuclei: an interpretation in terms of a dynamic model of articulation, JASA, Vol.40, No.l, pp.123–132(1966).
Google Scholar
K.M.N.Menon, P.J.Jensen and D.Dew: Acoustic properties of certain VCC utterances, JASA, Vol.46, No.2, pp.449–457(1970).
Google Scholar
D.J.Broad and R.H.Fertig: Formant-frequency trajectories in selected CVC-syllable nuclei, JASA, Vol.47, No.6, pp.1572–1582(1970).
Google Scholar
K.Tabata and T.Sakai: Evaluation of the Speaker-factor in Japanese VCV utterances, IECEJ Trans. Vol.60E, No.6, pp.284–289(1977).
Google Scholar
H.Kasuya, H.Suzuki and K.Kido: On properties of formant frequencies of vowels in meaningless words composed of three mores, Technical report on Electric Acoustics of IECEJ, EA68–13 (1968, in Japanese).
Google Scholar
H.Kuwahara and H.Sakai: Normalization of coarticulation effect for a sequence of vowels in connected speech, ASJ Trans. Vol.29, No.2, pp.91–99(1973, in Japanese).
Google Scholar
Y.Saito and H.Fujisaki: Formulation of the process of coarticulation in terms of formant frequencies and its application to automatic speech recognition, ASJ Trans. Vol.34, No.3, pp.177–185(1978, in Japanese).
Google Scholar
S.Itahashi and S.Yokoyama: Formant trajectory tracking and its approximation by second order linear system, Record of Joint Meeting of ASJ, 2–1–11, May, 1973(in Japanese).
Google Scholar
K.Tabata, A.Kamei and Y.Ohno: Hearing evaluation of speaker factor in vowel utterances, Record of Joint Meeting of ASJ, 1–5–11, Apr.1977 (in Japanese)
Google Scholar
K.Ito and S.Saito: Analysis of talker information of speech wave, Record of Joint Meeting of ASJ, 2–1–3, Oct. 1977 (in Japanese).
Google Scholar
H.Shirakata: Changes in feature parameters of Japanese vowels by age and sex of speakers, and recognition of vowels, Master thesis, Dept. of Inform. Science, Kyoto University(1979, in Japanese).
Google Scholar
F.Nakatsu and M.Kohda: On the performance of the acoustic processor in the on-line conversational speech recognition system, Record of Joint Meeting of ASJ, 4–2–7, Apr. 1977 (in Japanese).
Google Scholar
S.Saito and S.Furui: Personal information in dynamic characteristics of speech spectra, Proceedings of the 4-th IJCPR, pp.1014–1018(1978).
Google Scholar
H.Matsumoto and T.Nimura: Text-independent speaker identification using canonical discriminant analysis, the effect of speaker-factor, phoneme x speaker factor, and temporal variation factor, Technical report on Electronics and Acoustics of IECEJ, EA77–33(1977, in Japanese).
Google Scholar
M.Kohda and S.Saito: Influence of long-term variations of learning and unknown samples on recognition rate of spoken digits, Record of Joint Meeting of ASJ, 1–3–23, Oct. 1973 (in Japanese).
Google Scholar
L.J.Gerstman: Classification of self-normalized vowels, IEEE Trans. Vol.AU-16, pp.78–80 (1968).
Google Scholar
H.Fujisaiki, N.Nakamura and K.Yoshimoto: Normalization and recognition of sustained Japanese vowels, ASJ Trans. Vol.26, No.3, pp.152–153 (1970).
Google Scholar
H.Wakita: Normalization of vowels by vocal-tract length and its application to vowel identification, IEEE Trans. Vol. ASSP-25, No.2, pp.183–192 (1977).
Google Scholar
G. Fant: Speech sounds and features, M.I.T. Press (1973).
Google Scholar
M.R.Sambur and L.R.Rabiner: A speaker-independent digit recognition system, BELL S.T.J., Vol.54, pp81–102 (1975).
Google Scholar
S.Saito and M.Kohda: Spoken word recognition using the restricted number of learnig samples, Conference Record of ICASSP, pp.229–232 (1976).
Google Scholar
S.Nakagawa and T.Sakai: Areal time spoken word recognition system with various learning capabilities of the speaker differences, IECEJ Trans. Vol.61-D, No.6. pp.395–402 (1978, in Japanese).
Google Scholar
S.Furui: An efficient learning method for spoken word recognition, Technical report on speech of ASJ, S77–43 (1977, in Japanese).
Google Scholar
B.T.Lowerre: Dynamic speaker adaption in the HARPY speech recognition system, Conference Record of ICASSP, pp.788–790 (1977).
Google Scholar
T.Sakai: Adaptive system of pattern recognition, in Methodologies of Pattern Recognition, ed. S. Watanabe, pp.457–480, Academic Press, (1969).
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Science, Kyoto University, Japan
Toshiyuki Sakai

Authors

Toshiyuki Sakai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut de Programmation, Université Pierre et Marie Curie, Paris VI, France
J. C. Simon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sakai, T. (1980). Automatic Mapping of Acoustic Features into Phonemic Labels. In: Simon, J.C. (eds) Spoken Language Generation and Understanding. NATO Advanced Study Institutes Series, vol 59. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-9091-3_8

Download citation

DOI: https://doi.org/10.1007/978-94-009-9091-3_8
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-009-9093-7
Online ISBN: 978-94-009-9091-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics