Skip to main content

Automatic Mapping of Acoustic Features into Phonemic Labels

  • Conference paper
Spoken Language Generation and Understanding

Part of the book series: NATO Advanced Study Institutes Series ((ASIC,volume 59))

Abstract

This paper describes significant problems for automatic mapping of acoustic features into phonemic labels of the phonemic(phonetic) block in an automatic speech recognition. These problems are feature parameter, segmentation, labeling, co-articulation and speaker differences. We also discuss some general approaches for language independent problems, especially, pattern matching techniques for labeling, and describe our approach method in the LITHAN speech understanding system. Since these depend on each other, lastly, we emphasize that a system should have adaptive functions for various factors which bring out the varieties of speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D.R.Broad and J.E.Shoup: Concepts for acoustic phonetic recognition, in Speech Recognition, ed. R.Reddy, pp.243–274, Academic Press (1975).

    Google Scholar 

  2. S.Nakagawa: A machine understanding system for spoken Japanese sentences, Ph.D thesis, Kyoto University (1976).

    Google Scholar 

  3. T.Sakai and S.Nakagawa: A speech understanding system of simple Japanese sentences in a task domain, IECEJ Trans. Vol-60E, No.l, pp.13–20(1977).

    Google Scholar 

  4. T.Sakai ans S.Nakagawa: Speech understanding system — LITHAN -and some applications, Proceedings of the 3rd IJCPR, pp.621–625 (1976).

    Google Scholar 

  5. S.Nakagawa and T.Sakai: A word recognition method from a classified phoneme string in the LITHAN speech understanding system, Conference Record of ICASSP, pp.726–730(1978).

    Google Scholar 

  6. S.Nakagawa and T.Sakai: On parsing direction and tree search in the LITHAN speech understanding system, ASA and ASJ Joint Meeting, JASA, Vol.64S, No.l (1978).

    Google Scholar 

  7. JJ.Wolf: Efficient acoustic parameters for speaker recognition, JASA, Vol.51, No.6, pp.2044–2056(1972).

    Google Scholar 

  8. B.S.Atal: Effectiveness of linear predictive characteristics of the speech wave for automatic speaker identification and verification, JASA, Vol.55, No.6, pp.1304–1312 (1974).

    Google Scholar 

  9. A.E.Rosenberg and M.R.Sambur: New technique for automatic speaker verification, IEEE Trans. Vol.ASSP-23, No.2, pp.169–176 (1975).

    Google Scholar 

  10. M.R. Sambur: Selection of Acoustic Features for speaker identification, IEEE Trans. Vol.ASSP-23, No.2, pp.176–182 (1975).

    Google Scholar 

  11. J.D.Markel, B.T.Oshika and A.H.Gray: Long-term averaging for speaker recognition, IEEE Trans. Vol.ASSP-25, No.4, pp.330–337

    Google Scholar 

  12. E.Bung et al.: Statistical techniques for automatic speaker recognition, Conference Record of ICASSP, pp.772–775(1977).

    Google Scholar 

  13. R.S.Cheung and B.A.Eisenstein: Feature selection via dynamic programming for text-independent speaker identification, IEEE Trans. Vol.ASSP-26, No.5, pp.397–403(1978).

    Google Scholar 

  14. Y.Grenier: Speaker identification from linear prediction, Proceedings of the 4-th IJCPR, pp.1019–1021(1978).

    Google Scholar 

  15. W.Klein, R.Plomp and L.C.W.Pols: Vowel spectra, vowel spaces, and vowel identification, JASA, Vol.48, No.4, pp.999–1009(1970).

    Google Scholar 

  16. L.C.W.Pols, H.R.C.Tromp and R.Plomp: Frequency analysis of Dutch vowels from 50 male speakers, JASA, Vol.53, No.4, pp.1093–1101(1973).

    Google Scholar 

  17. H.G.Goldberg: Segmentation and labeling of speech: a comparative performance evaluation, Ph.D thesis, Carnegie-Mellon University (1975).

    Google Scholar 

  18. P.F.Castelaz and R.J.Niederjohn: A comparison of linear prediction, FFT, zero-crossing analysis techniques for vowel recognition, Conference record of ICASSP, pp.541–545(1978).

    Google Scholar 

  19. A.Ichikawa, Y.Nakano and K.Nakata: Evaluation of various parameter sets in spoken digits recognition, IEEE Trans. Vol.AU-21, No.3, pp.202–209(1973).

    Google Scholar 

  20. G.M.White and R.B.Neely: Speech recognition experiments with linear prediction, bandpass filtering and dynamic programming, IEEE Trans, Vol.ASSP-24, No.2, pp.183–188(1976).

    Google Scholar 

  21. H.A.Barger and K.R.Rao: A comparison study of phonemic recognition by discrete orthogonal transforms, Conference Record of ICASSP, pp.553–556(1978).

    Google Scholar 

  22. S.Chiba, M.Watari and T.Watanabe: A speaker-independent word recognition system, Proceedings of the 4-th IJCPR, pp.995–999 (1978).

    Google Scholar 

  23. H.Kasuya and H.Wakita: Speech segmentation and feature normalization based on area functions, Conference Record of ICASSP, pp.29–32(1976).

    Google Scholar 

  24. T.Nakajima et al.: Estimation of vocal tract area function by adaptive reconvolution and adaptive speech analysis system, ASJ Trans. Vol.31, No.3, pp.157–166(1978, in Japanese).

    Google Scholar 

  25. K.Shirai and H.Honda: Feature extraction for speech recognition based on articulatory model, Proceedings of the 4-th IJCPR, pp.1064–1068 (1978).

    Google Scholar 

  26. P.Mermelstein: Automatic segmentation of speech into syllabic units, JASA, Vol.53, No.4, pp.880–883(1975).

    Google Scholar 

  27. R.Nakatsu and M.Kohda: Speech recognition of connected words, Proceedings of the 4-th IJCPR, pp.1009–1011(1978).

    Google Scholar 

  28. H.Kasuya and H.Wakita: On segmentation of continuous speech, Technical report on speech of ASJ, S78–10 (1978, in Japanese).

    Google Scholar 

  29. L.R.Rabiner and M.R.Sambur: An algorithm for determing the endpoints of isolated utterances, Bell Sys. Tech. J. Vol.54, pp.297–315(1975).

    Google Scholar 

  30. L.R.Rabiner, et al.: A comparative performance study of several pitch detection algorithms, IEEE Trans. Vol.ASSP-24, No.5, pp.399–418(1976).

    Google Scholar 

  31. B.S.Atal and L.R.Rabiner: A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition, IEEE Trans. Vol.ASSP-24, No.3, pp.201–212 (1976).

    Google Scholar 

  32. S.Nakagawa and T.Sakai: Some properties of Japanese sounds through perceptual experiments and spectral analyses, Studia Phonologica XI, pp.48–64(1977).

    Google Scholar 

  33. W.A.Lea, M.F.Madress and TE. Skinner: A prosodical-guided speech understanding strategy, IEEE symposium on speech recognition, pp. 38–44 (1974).

    Google Scholar 

  34. P.Mermelstein: The syntax of acoustic segments, Conference Record of ICASSP, pp.29–32(1976).

    Google Scholar 

  35. R.Demori, P.Laface and E.Piccolo: Automatic detection and description of syllabic features in continuous speech, IEEE Trans. Vol.ASSP-24, No.5, pp.365–379(1976).

    Google Scholar 

  36. K.W.Otten: Approaches to the machine recognition of conversational speech, in Advances in Computers, ed. M. Yovits, pp.127–163, Academic Press(1971).

    Google Scholar 

  37. T.Nakajima and T.Suzuki: Application of the articulatory feature vowel system to continuous speech, Record of Joint Meeting of ASJ, 2–2–5, Oct. 1978(in Japanese).

    Google Scholar 

  38. P.Mermelstein: On detecting nasals in continuous speech, JASA, Vol.61, No.2, pp.581–587(1977).

    Google Scholar 

  39. N.R.Dixon and H.F.Silverman: A general language-operated direction implementation system (GLODIS): its application to continuous speech recognition, IEEE Trans. Vol.ASSP-24, No.2, pp.137–162(1976).

    Google Scholar 

  40. F.Jelinek: Continuous speech recognition by statistical methods, Proceedings of the IEEE, Vol.64, No.4, pp.532–556 (1976).

    Article  Google Scholar 

  41. D.R.Reddy: Computer recognition of connected speech, JASA, Vol.42, pp.329–347(1967).

    Google Scholar 

  42. C.J.Weinstein et al.: A system for acoustic-phonetic analysis of continuous speech, IEEE Trans, Vol.ASSP-23, No.l, pp.54–67 (1975)

    Google Scholar 

  43. K.Shikano and M.Kohda: On the LPC distance measures for vowel recognition in continuous utterances, Technical report on speech of ASJ, S78–19(1978, in Japanese).

    Google Scholar 

  44. F.Itakura: Minimum prediction residual principle applied to speech recognition, IEEE Trans. Vol.ASSP-23, No.l, pp.67–72 (1975).

    Google Scholar 

  45. M.Kohda, S.Hashimoto and S.Saito: Spoken digit mechanical recognition system, IECEJ Trans. Vol.55-D, No.3, pp.186–193 (1972, in Japanese).

    Google Scholar 

  46. A.H.Gray and J.D.Markel: Distance measures for speech processing, IEEE Trans. Vol.ASSP-24, No.5, pp.380–391(1976).

    Google Scholar 

  47. H.F.Siverman and N.R.Dixon: A comparison of several speech-spectra classification methods, IEEE Trans, No.4, pp.289–298(1976).

    Google Scholar 

  48. T.Nakajima and T.Suzuki:Study on variation of vowel tract shapes in continuous speech and vowel discrimination experiment based on articulatory feature extraction, Technical report on speech of ASJ, S77–42 (1977, in Japanese).

    Google Scholar 

  49. Y.Niimi: A method for forming universal reference patterns in an isolated word recognition system, Proceedings of the 4-th IJCPR, pp.1022–1032(1978).

    Google Scholar 

  50. K.Tanaka: A standard category pattern making method with application to phoneme recognition, Proceedings of the 4-th IJCPR, pp.1030–1032(1978).

    Google Scholar 

  51. S.Nakagawa and T.Sakai: A real time spoken word recognition system in a large vocabulary with learning capability of speaker differences, Proceedings of the 4-th IJCPR, pp.985–989 (1978).

    Google Scholar 

  52. V.M.Velichko and N.G.Zagoruiko: Automatic recognition of 200 words, Int.J.Man-Machine Studies, Vol.2, pp.223–234 (1970).

    Article  Google Scholar 

  53. H.Sakoe and S.Chiba: A dynamic programming approach to continuous speech recognition, Report. 7-th ICA, 20-c-13(1971).

    Google Scholar 

  54. H.Sakoe and S.Chiba:Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. Vol.ASSP-26, No.l, pp.43–49(1978).

    Google Scholar 

  55. K.Tanaka: A dynamic processing approach to extraction and categorization of phonemic information, Conference Record of ICASSP, pp.5–8(1976).

    Google Scholar 

  56. H.Matsumoto and H.Wakita: Vowel normalization by frequency warping, ASA and ASJ Joint Meeting, JASA, Vol.64S, No.l (1978).

    Google Scholar 

  57. R.W.Christiansen and C.K.Rushforth: Detecting and locating key words in continuous speech using predictive coding, IEEE Trans. Vol.ASSP-25, No.5, pp.361–367(1977).

    Google Scholar 

  58. W.A.Woods: Motivation and overview of BBN SPEECHLIS: an experimented prototype for speech understanding research, IEEE Trans. Vol.ASSP-23, No.l, pp.2–10 (1975).

    Google Scholar 

  59. K.Shikano and M.Kohda: An estimation system of phoneme recognition rate of phoneme lattice, Record of Joint Meeting of ASJ, 3–1–17, Oct. 1977 (in Japanese).

    Google Scholar 

  60. H.Mizukami: Influence of phoneme recognition ability on word recognition rate, Graduation thesis, Dept. of Inform. Science, Kyoto University (1979, in Japanese).

    Google Scholar 

  61. Y.Takeuchi: Perceptual study of segmented Japanese monosyllables, Studia Phonologica I, pp.70–85(1961, in Japanese).

    Google Scholar 

  62. S.E.G.Öhman: Perception of segment of VCCV utterances, JASA, Vol.40, No.5, pp.979–988(1966).

    Google Scholar 

  63. W.A.Grimm: Perception of segments of English spoken consonant vowel syllables, JASA, Vol.40, No.5, pp.1454–1461(1966).

    Google Scholar 

  64. H.Kuwahara and H.Sakai: Perception of vowels and C-V syllables segmented from connected speech, ASJ Trans. Vol.28, No.5, pp.225–234(1972, in Japanese).

    Google Scholar 

  65. T.Gray: Articulatory movements in VCV sequences, JASA, Vol.62, No.1, pp.183–193(1977).

    Google Scholar 

  66. S.Kiritani and H.Hirose: Correlation analysis of the temporal patterns of articulatory movement and EMG, ASA and ASJ Joint Meeting, JASA, Vol.64S, No.1(1978).

    Google Scholar 

  67. S.Sekimoto and S. Kiritani: Parameter description of tongue point movements in the production of Japanese vowels, ASA and ASJ Joint Meeting, JASA, Vol.64S, No.1(1978).

    Google Scholar 

  68. K.N.Stevens and A.S.House: Perturbation of vowel articulations by consonantal context: an acoustical study, J. Speech Hearing Res. Vol.6, pp.111–128(1963).

    Google Scholar 

  69. S.E.G.Orman: Coarticulation in VCV utterances: spectrographic measurements, JASA, Vol.39, No.l, pp.151–168 (1966).

    Google Scholar 

  70. K.N.Stevens, A.S.House and A.P.Poul: Acoustical description of syllabic nuclei: an interpretation in terms of a dynamic model of articulation, JASA, Vol.40, No.l, pp.123–132(1966).

    Google Scholar 

  71. K.M.N.Menon, P.J.Jensen and D.Dew: Acoustic properties of certain VCC utterances, JASA, Vol.46, No.2, pp.449–457(1970).

    Google Scholar 

  72. D.J.Broad and R.H.Fertig: Formant-frequency trajectories in selected CVC-syllable nuclei, JASA, Vol.47, No.6, pp.1572–1582(1970).

    Google Scholar 

  73. K.Tabata and T.Sakai: Evaluation of the Speaker-factor in Japanese VCV utterances, IECEJ Trans. Vol.60E, No.6, pp.284–289(1977).

    Google Scholar 

  74. H.Kasuya, H.Suzuki and K.Kido: On properties of formant frequencies of vowels in meaningless words composed of three mores, Technical report on Electric Acoustics of IECEJ, EA68–13 (1968, in Japanese).

    Google Scholar 

  75. H.Kuwahara and H.Sakai: Normalization of coarticulation effect for a sequence of vowels in connected speech, ASJ Trans. Vol.29, No.2, pp.91–99(1973, in Japanese).

    Google Scholar 

  76. Y.Saito and H.Fujisaki: Formulation of the process of coarticulation in terms of formant frequencies and its application to automatic speech recognition, ASJ Trans. Vol.34, No.3, pp.177–185(1978, in Japanese).

    Google Scholar 

  77. S.Itahashi and S.Yokoyama: Formant trajectory tracking and its approximation by second order linear system, Record of Joint Meeting of ASJ, 2–1–11, May, 1973(in Japanese).

    Google Scholar 

  78. K.Tabata, A.Kamei and Y.Ohno: Hearing evaluation of speaker factor in vowel utterances, Record of Joint Meeting of ASJ, 1–5–11, Apr.1977 (in Japanese)

    Google Scholar 

  79. K.Ito and S.Saito: Analysis of talker information of speech wave, Record of Joint Meeting of ASJ, 2–1–3, Oct. 1977 (in Japanese).

    Google Scholar 

  80. H.Shirakata: Changes in feature parameters of Japanese vowels by age and sex of speakers, and recognition of vowels, Master thesis, Dept. of Inform. Science, Kyoto University(1979, in Japanese).

    Google Scholar 

  81. F.Nakatsu and M.Kohda: On the performance of the acoustic processor in the on-line conversational speech recognition system, Record of Joint Meeting of ASJ, 4–2–7, Apr. 1977 (in Japanese).

    Google Scholar 

  82. S.Saito and S.Furui: Personal information in dynamic characteristics of speech spectra, Proceedings of the 4-th IJCPR, pp.1014–1018(1978).

    Google Scholar 

  83. H.Matsumoto and T.Nimura: Text-independent speaker identification using canonical discriminant analysis, the effect of speaker-factor, phoneme x speaker factor, and temporal variation factor, Technical report on Electronics and Acoustics of IECEJ, EA77–33(1977, in Japanese).

    Google Scholar 

  84. M.Kohda and S.Saito: Influence of long-term variations of learning and unknown samples on recognition rate of spoken digits, Record of Joint Meeting of ASJ, 1–3–23, Oct. 1973 (in Japanese).

    Google Scholar 

  85. L.J.Gerstman: Classification of self-normalized vowels, IEEE Trans. Vol.AU-16, pp.78–80 (1968).

    Google Scholar 

  86. H.Fujisaiki, N.Nakamura and K.Yoshimoto: Normalization and recognition of sustained Japanese vowels, ASJ Trans. Vol.26, No.3, pp.152–153 (1970).

    Google Scholar 

  87. H.Wakita: Normalization of vowels by vocal-tract length and its application to vowel identification, IEEE Trans. Vol. ASSP-25, No.2, pp.183–192 (1977).

    Google Scholar 

  88. G. Fant: Speech sounds and features, M.I.T. Press (1973).

    Google Scholar 

  89. M.R.Sambur and L.R.Rabiner: A speaker-independent digit recognition system, BELL S.T.J., Vol.54, pp81–102 (1975).

    Google Scholar 

  90. S.Saito and M.Kohda: Spoken word recognition using the restricted number of learnig samples, Conference Record of ICASSP, pp.229–232 (1976).

    Google Scholar 

  91. S.Nakagawa and T.Sakai: Areal time spoken word recognition system with various learning capabilities of the speaker differences, IECEJ Trans. Vol.61-D, No.6. pp.395–402 (1978, in Japanese).

    Google Scholar 

  92. S.Furui: An efficient learning method for spoken word recognition, Technical report on speech of ASJ, S77–43 (1977, in Japanese).

    Google Scholar 

  93. B.T.Lowerre: Dynamic speaker adaption in the HARPY speech recognition system, Conference Record of ICASSP, pp.788–790 (1977).

    Google Scholar 

  94. T.Sakai: Adaptive system of pattern recognition, in Methodologies of Pattern Recognition, ed. S. Watanabe, pp.457–480, Academic Press, (1969).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1980 D. Reidel Publishing Company

About this paper

Cite this paper

Sakai, T. (1980). Automatic Mapping of Acoustic Features into Phonemic Labels. In: Simon, J.C. (eds) Spoken Language Generation and Understanding. NATO Advanced Study Institutes Series, vol 59. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-9091-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-94-009-9091-3_8

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-009-9093-7

  • Online ISBN: 978-94-009-9091-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics