Predictive models for sequence modelling, application to speech and character recognition

  • P. Gallinari
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1387)


We have described a series of predictive models which have been developed for capturing some kind of dependency inside non stationary sequences. Although the precise motivations and the inspiration sources for these different models have been multi-fold, they are aimed at the same goal. Other attempts have been developed which we have not described here. An important class of models which uses parametric trajectories is that of Segment Models, a review and a comparison with HMMs may be found in [37]. Up to now, predictive models have not led to better results than classical multi-gaussian HMMs. Most of the time, the experiments reported by the different authors are performed on small sized or limited complexity problems. However, some authors also report excellent performances of some predictive models on different tasks. In the second part of the paper, we have described a non linear predictive HMM, which is based on regressive neural networks. We have presented experiments on two relatively large tasks where the model reaches state of the art performances.


Hide Markov Model Word Recognition Speech Recognition Speech Signal Speaker Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Furui S., 1986, Speaker independent isolated word recognition using dynamic features of speech spectrum, IEEE T. ASSP, 34, 52–59.Google Scholar
  2. 2.
    Poritz A.B., 1982: linear predictive HMMs and the speech signal, ICASSP, Vol. 2, 1291–1294.Google Scholar
  3. 3.
    Wellekens C., 1987, Explicit time correlation in hidden Markov models for speech recognition” ICASSP'87, pp 384–386.Google Scholar
  4. 4.
    Brown P.F., 1987, The acoustic modeling problem in automatic speech recognition”, PhD thesis, Carnegie Mellon University.Google Scholar
  5. 5.
    Juang B.H., Rabiner L.R., 1985, Mixture autoregressive hidden Markov models for speech signals In IEEE T. ASSP, Vol. 33, Nℴ6, pp 1404–1413, dec.MathSciNetGoogle Scholar
  6. 6.
    Kenny P., Lennig M., Memelstein P., 1990: A linear predictive HMM for Vector-valued Observations with application to speech recognition, IEEE Trans. on Acoustics Speech and Signal Processing, ASSP-38, 2, pp 220–225.CrossRefGoogle Scholar
  7. 7.
    Woodland P.C., 1992, Hidden Markov models using vector linear predictors and discriminative output distributions” ICASSP'92, pp 509–512.Google Scholar
  8. 8.
    Tishby N., 1991: on the application of mixture AR HMMs to text-independent speaker recognition, IEEE Trans. on Signal Processing, Vol. 39, Nℴ 3, March 91.Google Scholar
  9. 9.
    Kawabata T., 1993: speaker-independent speech recognition using nonlinear predictor codebooks, ICASSP.Google Scholar
  10. 10.
    Artières T., Gallinari P., 1995: multi-state predictive neural models for text-independent speaker identification, Eurospeech 95.Google Scholar
  11. 11.
    Mellouk A., Gallinari P., 1993:“A discriminative neural prediction system for speech recognition”, ICASSP 93, ppII 553–536.Google Scholar
  12. 12.
    Deng L., Hassanein H., Elsmary M., 1994, Analysis of the correlation structure for a neural predictive model with application to speech recognition, Neural Networks, Vol. 7, Nℴ 2, 331–339.CrossRefGoogle Scholar
  13. 13.
    Bianchini M., Frasconi P., Gori M., 1995: learning in multilayered networks used as autoassociators, IEEE Transactions on Neural Networks, vol. 6, no. 2, 512–514.CrossRefGoogle Scholar
  14. 14.
    Artières T., 1995: Approches prédictives neuronales: application à l'identification du locuteur, Thèse de doctorat, Université de Paris Sud (In french).Google Scholar
  15. 15.
    Tebelskis J., Waibel A., Petek B., Schmidbauer O., 1991, Continuous speech recognition using linked predictive neural networks, ICASSP 91, pp 61–64.Google Scholar
  16. 16.
    Iso K., Watanabe T., 1990: speaker-independent word recognition using a neural prediction model, ICASSP.Google Scholar
  17. 17.
    Iso K., Watanabe T., 1991: “ Large vocabulary speech recognition using neural prediction model”, ICASSP 91, pp 57–60.Google Scholar
  18. 18.
    Petek B., Waibel A., Tebelskis J., 1992, Integrated and phoneme-function word architecture of hidden control neural networks for continuous speech recognition” In Speech Communication, Special Issue on Eurospeech, Vol. 11, Nℴ2, pp 273–282.Google Scholar
  19. 19.
    Levin E., 1993: hidden control neural architecture modeling of non linear time varying systems and its applications, IEEE Trans on NN, vol 4.Google Scholar
  20. 20.
    Tsuboka E, Takada Y, Wakita H., 1990: neural predictive hidden Markov model, ICSLP.Google Scholar
  21. 21.
    Rabiner L., Juang B.H., 1993, Fundamentals of speech recognition, Prentice Hall.Google Scholar
  22. 22.
    Deng L., Aksmanovic M., Sun X., 1994, Speech recognition using hidden markov models with polynomial functions as nonstationary states, IEEE Trans. SAP, 507–520.Google Scholar
  23. 23.
    Hattori H., 1992: text independent speaker recognition using neural networks, ICASSP, II 153–156.Google Scholar
  24. 24.
    Mellouk A., Gallinari P., 1994 Discriminative training for improved neural prediction system, ICASSP 94, pp 1233–1236.Google Scholar
  25. 25.
    Mellouk A., Gallinari P., 1995, Global discrimination for neural predictive systems based on N-Best algorithm” ICASSP'95.Google Scholar
  26. 26.
    Rao T.S., The fitting of nonstationnary time series model with time dependent parameters, J. R. S. S. Series B, vol 32, nℴ 2, 312–322.Google Scholar
  27. 27.
    Liporace L.A., 1975, Linear estimation of non stationary signals, J. Acoust. Soc. Amer., vol 58, nℴ 6, 1288–1295.CrossRefGoogle Scholar
  28. 28.
    Grenier Y., 1983, Time-dependent ARMA modeling of non stationary signals, IEEE T. ASSP, Vol. 31, Nℴ 4, 899–911.Google Scholar
  29. 29.
    Gish H., Ng K., 1993, A segmental speech model with applications to word spotting, ICASSP'93, 11-447-450.Google Scholar
  30. 30.
    Deng L., 1993, A stochastic model of speech incorporating hirerarchical non-stationarity, IEEE T. SAP, Vol. 1, Nℴ 4, 471–474.Google Scholar
  31. 31.
    Deng L., Rathinavelu C., 1995, A markov model containing state-conditioned second order non-stationarity: application to speech recognition, Comp. Speech and Lang., 9, 63–86.CrossRefGoogle Scholar
  32. 32.
    Garcia-Salcetti, Dorizzi B., Gallinari P., Wimmer Z., 1996, Adaptive discrimination in an HMM based neural predictive system for on-line word recognition, ICPR-96.Google Scholar
  33. 33.
    Robinson T., 1991, Several improvements to a recurrent error propagation network phone recognition system”, Tech. Rep. CUED/F-INFENG/TR.82, Cambridge Univ. Eng. Dept, Sept.Google Scholar
  34. 34.
    Lee K. F., Hon H-W., 1989, Speaker-independent phone recognition using hidden markov models”, IEEE Trans. ASSP, Vol 37, no 11. 1641–1648.CrossRefGoogle Scholar
  35. 35.
    Manke S., Finke M., Waibel A., 1995, NPen++: a writer independent large vocabulary on line hand-writing recognition system, ICDAR'95, 403–408.Google Scholar
  36. 36.
    Schwartz R., Chow Y.L., 1990 The N-Best algorithm: An efficient and exact procedure for finding the N most likely hypotheses” In ICASSP 90, pp 81–84.Google Scholar
  37. 37.
    Ostendorf M., Digalakis V., Kimball O.A., 1996, From HHM's to segment models: a unified view of stochastic modelling for speech recognition, IEEE T. SAP, Vol 4, Nℴ 5, 360–378.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • P. Gallinari
    • 1
  1. 1.LIP 6 Université Paris 6Paris cedex 5France

Personalised recommendations