Skip to main content

Probabilistic Interpretation of Neural Networks for the Classification of Vectors, Sequences and Graphs

  • Chapter
Book cover Innovations in Neural Information Paradigms and Applications

Part of the book series: Studies in Computational Intelligence ((SCI,volume 247))

Abstract

This chapter introduces a probabilistic interpretation of artificial neural networks (ANNs), moving the focus from posterior probabilities to probability density functions (pdfs). Parametric and non-parametric neural-based algorithms for unsupervised estimation of pdfs, relying on maximum-likelihood or on the Parzen Window techniques, are reviewed. The approaches may overcome the limitations of traditional statistical estimation methods, possibly leading to improved pdf models. Two paradigms for combining ANNs and hidden Markov models (HMMs) for sequence recognition are then discussed. These models rely on (i) an ANN that estimates state-posteriors over a maximum-a-posteriori criterion, or on (ii) a connectionist estimation of emission pdfs, featuring global optimization of HMM and ANN parameters over a maximumlikelihood criterion. Finally, the chapter faces the problem of the classification of graphs (structured data), by presenting a connectionist probabilistic model for the posterior probability of classes given a labeled graphical pattern. In all cases, empirical evidence and theoretical arguments underline the fact that plausible probabilistic interpretations of ANNs are viable and may lead to improved statistical classifiers, not only in the statical but also in the sequential and structured pattern recognition setups.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5(2), 157–166 (1994); Special Issue on Recurrent Neural Networks (March 1994)

    Article  Google Scholar 

  2. Bengio, Y.: Neural Networks for Speech and Sequence Recognition. International Thomson Computer Press, London (1996)

    Google Scholar 

  3. Bengio, Y., De Mori, R., Flammia, G., Kompe, R.: Global optimization of a neural network-hidden Markov model hybrid. IEEE Transactions on Neural Networks 3(2), 252–259 (1992)

    Article  Google Scholar 

  4. Besag, J.: Spatial Interaction and the Statistical Analysis of Lattice Systems. Journal of the Royal Statistical Society 36, 192–236 (1974)

    MATH  MathSciNet  Google Scholar 

  5. Besag, J.: Statistical Analysis of Non-Lattice Data. The Statistician 24, 179–195 (1975)

    Article  Google Scholar 

  6. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)

    Google Scholar 

  7. Bourlard, H., Morgan, N.: Continuous speech recognition by connectionist statistical methods. IEEE Trans. on Neural Networks 4(6), 893–909 (1993)

    Article  Google Scholar 

  8. Bourlard, H., Morgan, N.: Connectionist Speech Recognition. A Hybrid Approach. The Kluwer international series in engineering and computer science, vol. 247. Kluwer Academic Publishers, Boston (1994)

    Google Scholar 

  9. Bourlard, H., Morgan, N.: Connectionist Speech Recognition. A Hybrid Approach, p. 117. Kluwer Academic Publishers, Boston (1994)

    Google Scholar 

  10. Bourlard, H., Wellekens, C.: Links between hidden Markov models and multilayer perceptrons. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 1167–1178 (1990)

    Article  Google Scholar 

  11. Bridle, J.S.: Alphanets: a recurrent ‘neural’ network architecture with a hidden Markov model interpretation. Speech Communication 9(1), 83–92 (1990)

    Article  Google Scholar 

  12. Buntine, W.L.: Chain Graphs for Learning. In: UAI 1995: Proceedings of the Eleventh Annual Conference on Uncertainty in Artificial Intelligence, pp. 46–54. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  13. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)

    MATH  Google Scholar 

  14. Elman, J.L.: Finding structure in time. Cognitive Science 14, 179–211 (1990)

    Article  Google Scholar 

  15. Franzini, M.A., Lee, K.F., Waibel, A.: Connectionist Viterbi training: a new hybrid method for continuous speech recognition. In: International Conference on Acoustics, Speech and Signal Processing, Albuquerque, NM, pp. 425–428 (1990)

    Google Scholar 

  16. Gonsalves, C.M.: Comparison Of Search-based And Kernel-based Methods For Graph-based Relational Learning. University of Texas at Arlington (August 2005)

    Google Scholar 

  17. Gori, M., Bengio, Y., De Mori, R.: BPS: A learning algorithm for capturing the dynamical nature of speech. In: Proceedings of the International Joint Conference on Neural Networks, Washington D.C, pp. 643–644. IEEE, New York (1989)

    Google Scholar 

  18. Gori, M., Monfardini, G., Scarselli, F.: A new model for learning in graph domains. In: Proc. of IJCNN 2005 (August 2005)

    Google Scholar 

  19. Haffner, P., Franzini, M., Waibel, A.: Integrating time alignment and neural networks for high performance continuous speech recognition. In: International Conference on Acoustics, Speech and Signal Processing, Toronto, pp. 105–108 (1991)

    Google Scholar 

  20. Hammer, B., Micheli, A., Sperduti, A.: Universal approximation capability of cascade correlation for structures. Neural Computation 17(5), 1109–1159 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  21. Haykin, S.: Neural Networks (A Comprehensive Foundation). Macmillan, New York (1994)

    MATH  Google Scholar 

  22. Hertz, J., Krogh, A., Palmer, R.: Introduction to the Theory of Neural Computation. Addison-Wesley, Reading (1991)

    Google Scholar 

  23. Huang, X.D., Ariki, Y., Jack, M.: Hidden Markov Models for Speech Recognition. Edinburgh University Press, Edinburgh (1990)

    Google Scholar 

  24. Jordan, M.I. (ed.): Learning in Graphical Models. MIT Press, Cambridge (1999)

    Google Scholar 

  25. Jordan, M.I.: Serial order: A parallel, distributed processing approach. In: Elman, J.L., Rumelhart, D.E. (eds.) Advances in Connectionist Theory: Speech. Lawrence Erlbaum, Hillsdale (1989)

    Google Scholar 

  26. Kindermann, R., Snell, J.L.: Markov Random Fields and Their Applications. American Mathematical Society, Providence (1980)

    Google Scholar 

  27. Lavrac, N., Dzeroski, S.: Inductive Logic Programming: Techniques and Applications. Ellis Horwood, New York (1994)

    MATH  Google Scholar 

  28. Levin, E.: Word recognition using hidden control neural architecture. In: International Conference on Acoustics, Speech and Signal Processing, Albuquerque, NM, pp. 433–436 (1990)

    Google Scholar 

  29. Liu, D.C., Nocedal, J.: On the Limited Memory BFGS Method for Large Scale Optimization. Mathematical Programming 45, 503–528 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  30. Minsky, M.L., Papert, S.A.: Perceptrons. MIT Press, Cambridge (1969)

    MATH  Google Scholar 

  31. Morgan, N., Bourlard, H.: Continuous speech recognition using multilayer perceptrons with hidden Markov models. In: International Conference on Acoustics, Speech and Signal Processing, Albuquerque, NM, pp. 413–416 (1990)

    Google Scholar 

  32. Mozer, M.C.: Neural net architectures for temporal sequence processing. In: Weigend, A., Gershenfeld, N. (eds.) Predicting the future and understanding the past, pp. 243–264. Addison-Wesley, Redwood City (1993)

    Google Scholar 

  33. Neal, R.M.: Connectionist Learning of Belief Networks. Artificial Intelligence 56(1), 71–113 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  34. Neapolitan, R.E.: Learning Bayesian Networks. Prentice-Hall, Upper Saddle River (2004)

    Google Scholar 

  35. Niles, L.T., Silverman, H.F.: Combining hidden Markov models and neural network classifiers. In: International Conference on Acoustics, Speech and Signal Processing, Albuquerque, NM, pp. 417–420 (1990)

    Google Scholar 

  36. Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Francisco (1988)

    Google Scholar 

  37. Pearlmutter, B.A.: Learning state space trajectories in recurrent neural networks. Neural Computation 1, 263–269 (1989)

    Article  Google Scholar 

  38. Pérez, P.: Markov Random Fields and Images. CWI Quarterly 11, 413–437 (1998)

    MATH  MathSciNet  Google Scholar 

  39. Pineda, F.J.: Recurrent back-propagation and the dynamical approach to adaptive neural computation. Neural Computation 1, 161–172 (1989)

    Article  Google Scholar 

  40. Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)

    Google Scholar 

  41. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  42. Richardson, M., Domingos, P.: Markov Logic Networks. Machine Learning 62, 107–136 (2006)

    Article  Google Scholar 

  43. Robinson, R.W.: Counting Unlabeled Acyclic Digraphs. In: Little, C.H.C. (ed.) Combinatorial Mathematics V. LNM, vol. 622, pp. 28–43. Springer, New York (1977)

    Chapter  Google Scholar 

  44. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing, ch. 8, vol. 1, pp. 318–362. MIT Press, Cambridge (1986)

    Google Scholar 

  45. Sato, M.: A real time learning algorithm for recurrent analog neural networks. Biological Cybernetics 62, 237–241 (1990)

    Article  MATH  Google Scholar 

  46. Sperduti, A., Starita, A.: Supervised neural networks for the classification of structures. IEEE Transactions on Neural Networks 8(3), 714–735 (1997)

    Article  Google Scholar 

  47. Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn. MIT Press, Cambridge (2001); Original work published 1993 by Springer-Verlag

    MATH  Google Scholar 

  48. Tebelskis, J., Waibel, A., Petek, B., Schmidbauer, O.: Continuous speech recognition using linked predictive networks. In: Lippman, R.P., Moody, R., Touretzky, D.S. (eds.) Advances in Neural Information Processing Systems, Denver, CO, vol. 3, pp. 199–205. Morgan Kaufmann, San Mateo (1991)

    Google Scholar 

  49. Trentin, E.: Networks with trainable amplitude of activation functions. Neural Networks 14, 471–493 (2001)

    Article  Google Scholar 

  50. Trentin, E., Di Iorio, E.: A Simple and Effective Neural Model for the Classification of Structured Patterns. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007, Part I. LNCS (LNAI), vol. 4692, pp. 9–16. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  51. Trentin, E., Di Iorio, E.: Classification of Molecular Structures Made Easy. In: 2008 International Joint Conference on Neural Networks, pp. 3241–3246 (2008)

    Google Scholar 

  52. Trentin, E., Gori, M.: Robust combination of neural networks and hidden Markov models for speech recognition. IEEE Transactions on Neural Networks 14(6) (November 2003)

    Google Scholar 

  53. Trentin, E., Gori, M.: Inversion-Based Nonlinear Adaptation of Noisy Acoustic Parameters for a Neural/HMM Speech Recognizer. Neurocomputing 70, 398–408 (2006)

    Article  Google Scholar 

  54. Trentin, E., Matassoni, M., Gori, M.: Evaluation on the Aurora 2 Database of Acoustic Models that are less Noise-sensitive. In: Proceedings of Eurospeech 2003 (September 2003)

    Google Scholar 

  55. Trentin, E.: Simple and Effective Connectionist Nonparametric Estimation of Probability Density Functions. In: Schwenker, F., Marinai, S. (eds.) ANNPR 2006. LNCS (LNAI), vol. 4087, pp. 1–10. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  56. Waibel, A.: Modular construction of time-delay neural networks for speech recognition. Neural Computation 1, 39–46 (1989)

    Article  Google Scholar 

  57. Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K.: Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing 37, 328–339 (1989)

    Article  Google Scholar 

  58. Werbos, P.J.: Generalization of backpropagation with application to a recurrent gas market model. Neural Networks 1, 339–356 (1988)

    Article  Google Scholar 

  59. Williams, R.J., Zipser, D.: Experimental analysis of the real-time recurrent learning algorithm. Connection Science 1, 87–111 (1989)

    Article  Google Scholar 

  60. Williams, R.J., Zipser, D.: A learning algorithm for continually running fully recurrent neural networks. Neural Computation 1, 270–280 (1989)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Trentin, E., Freno, A. (2009). Probabilistic Interpretation of Neural Networks for the Classification of Vectors, Sequences and Graphs. In: Bianchini, M., Maggini, M., Scarselli, F., Jain, L.C. (eds) Innovations in Neural Information Paradigms and Applications. Studies in Computational Intelligence, vol 247. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04003-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04003-0_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04002-3

  • Online ISBN: 978-3-642-04003-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics