Articulatory Feature Extraction from Voice and Their Impact on Hybrid Acoustic Models

  • Jorge Lombart
  • Antonio Miguel
  • Eduardo Lleida
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8854)


There is a great amount of information in the speech signal, although current speech recognizers do not exploit it completely. In this paper articulatory information is extracted from speech and fused to standard acoustic models to obtain a better hybrid acoustic model which provides improvements on speech recognition. The paper also studies the best input signal for the system in terms of type of speech features and time resolution to obtain a better articulatory information extractor. Then this information is fused to a standard acoustic model obtained with neural networks to perform the speech recognition achieving better results.


Articulatory features Neural network Hybrid models 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kirchhoff, K.: Robust Speech Recognition Using Articulatory Information. PhD thesis, University of Bielefeld (1999)Google Scholar
  2. 2.
    Kirchhoff, K., Fink, G.A., Sagerer, G.: Combining acoustic and articulatory feature information for robust speech recognition. Speech Communication 37, 303–319 (2002)CrossRefzbMATHGoogle Scholar
  3. 3.
    Leung, K.Y., Mak, M.W., Kung, S.-Y.: Applying articulatory features to telephone-based speaker verification. In: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, p. I858. IEEE, Montreal (2004)Google Scholar
  4. 4.
    Yu, D., Siniscalchi, S.M., Deng, L., Lee, C.-H.: Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, pp. 4169–4172 (2012)Google Scholar
  5. 5.
    Hieronymus, J.: ASCII phonetic symbols for the world’s languages: Worldbet. Journal of the International Phonetic Association (1993)Google Scholar
  6. 6.
    Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). SpringerGoogle Scholar
  7. 7.
    Nair, V., Hinton, G.E.: Rectified Linear Units Improve Restricted Boltzmann Machines. In: Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, pp. 807–814 (2010)Google Scholar
  8. 8.
    Toth, L.: Phone recognition with deep sparse rectifier neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6985–6989. IEEE, Vancouver (2013)CrossRefGoogle Scholar
  9. 9.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet Classification with Deep Convolutional Neural Networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc. (2012)Google Scholar
  10. 10.
    Toth, L.: Convolutional Deep Rectifier Neural Nets for Phone Recognition. In: INTERSPEECH, pp. 1722–1726. ISCA, Lyon (2013)Google Scholar
  11. 11.
    Abdel-Hamid, O., Mohamed, A., Jiang, H., Penn, G.: Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, pp. 4277–4280 (2012)Google Scholar
  12. 12.
    Garofolo, J., et al.: TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1. Linguistic Data Consortium, Philadelphia (1993)Google Scholar
  13. 13.
    Lee, K.-F., Hon, H.-W.: Speaker-independent phone recognition using hidden Markov models. IEEE Transactions on Acoustics, Speech, and Signal Processing 37, 1641–1648 (1989)CrossRefGoogle Scholar
  14. 14.
    Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall (2001)Google Scholar
  15. 15.
    Yang, H., van Vuuren, S., Hermansky, H.: Relevancy of time-frequency features for phonetic classification measured by mutual information. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1999, vol. 1, pp. 225–228. IEEE, Phoenix (1999)Google Scholar
  16. 16.
    Yang, H.H., Van Vuuren, S., Sharma, S., Hermansky, H.: Relevance of timefrequency features for phonetic and speaker-channel classification. Speech Communication 31, 35–50 (2000)CrossRefGoogle Scholar
  17. 17.
    Segura, J., Benitez, M., Torre, A., de la Rubio, A.: Feature extraction from time-frequency matrices for robust speech recognition. In: INTERSPEECH, Aalborg, Denmark (2001)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jorge Lombart
    • 1
  • Antonio Miguel
    • 1
  • Eduardo Lleida
    • 1
  1. 1.ViVoLab, Aragon Institute for Engineering Research (I3A)University of ZaragozaSpain

Personalised recommendations