Latent Variables, Topographic Mappings and Data Visualization

  • Christopher M. Bishop
Conference paper
Part of the Perspectives in Neural Computing book series (PERSPECT.NEURAL)

Abstract

Most pattern recognition tasks, such as regression, classification and novelty detection, can be viewed in terms of probability density estimation. A powerful approach to probabilistic modelling is to represent the observed variables in terms of a number of hidden, or latent, variables. One well-known example of a hidden variable model is the mixture distribution in which the hidden variable is the discrete component label. In the case of continuous latent variables we obtain models such as factor analysis. In this paper we provide an overview of latent variable models, and we show how a particular form of linear latent variable model can be used to provide a probabilistic formulation of the well-known technique of principal components analysis (PCA). By extending this technique to mixtures, and hierarchical mixtures, of probabilistic PCA models we are led to a powerful interactive algorithm for data visualization. We also show how the probabilistic PCA approach can be generalized to non-linear latent variable models leading to the Generative Topographic Mapping algorithm (GTM). Finally, we show how GTM can itself be extended to model temporal data.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, T. W. (1958). An Introduction to Multivariate Statistical Analysis. New York: John Wiley.MATHGoogle Scholar
  2. Anderson, T. W. (1963). Asymptotic theory for principal component analysis. Annals of Mathematical Statistics 34, 122 - 148.CrossRefMATHMathSciNetGoogle Scholar
  3. Bartholomew, D. J. (1987). Latent Variable Models and Factor Analysis. London: Charles Griffin & Co. Ltd.MATHGoogle Scholar
  4. Basilevsky, A. (1994). Statistical Factor Analysis and Related Methods. New York: Wiley.CrossRefMATHGoogle Scholar
  5. Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University Press.Google Scholar
  6. Bishop, C. M., G. E. Hinton, and I. G. D. Strachan (1997). GTM through time. Accepted for publication in Proceedings IEE Fifth International Conference on Artificial Neural Networks, Cambridge, U.K.Google Scholar
  7. Bishop, C. M. and G. D. James (1993). Analysis of multiphase flows using dual-energy gamma densitometry and neural networks. Nuclear Instruments and Methods in Physics Research A327, 580 - 593.CrossRefGoogle Scholar
  8. Bishop, C. M., M. Svensen, and C. K. I. Williams (1996). Magnification factors for the GTM algorithm. To appear in Proceedings Fifth IEE International Conference on Artificial Neural Networks.Google Scholar
  9. Bishop, C. M., M. Svensen, and C. K. I. Williams (1997). GTM: the generative topographic mapping. Accepted for publication in Neural Computation. Available as NCRG/96/015 from http: //www. ncrg. aston. ac. uk/.Google Scholar
  10. Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B 39 (1), 1–38.MATHMathSciNetGoogle Scholar
  11. Hinton, G. E., P. Dayan, and M. Revow (1997). Modeling the manifolds of images of handwritten digits. IEEE Transactions on Neural Networks 8 (1), 65–74.CrossRefGoogle Scholar
  12. Hinton, G. E., C. K. I. Williams, and M. D. Revow (1992). Adaptive elastic models for hand-printed character recognition. In J. E. Moody, S. J. Hanson, and R. P. Lippmann (Eds.), Advances in Neural Information Processing Systems, Volume 4, pp. 512–519. Morgan KaufFmann.Google Scholar
  13. Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 24, 417–441.CrossRefGoogle Scholar
  14. Hull, J. J. (1994). A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 550–554.CrossRefGoogle Scholar
  15. Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, 59–69.CrossRefMATHMathSciNetGoogle Scholar
  16. Kohonen, T. (1995). Self-Organizing Maps. Berlin: Springer-Verlag.Google Scholar
  17. Krzanowski, W. J. and F. H. C. Marriott (1994). Multivariate Analysis Part I: Distributions, Ordination and Inference. London: Edward Arnold.MATHGoogle Scholar
  18. Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. The London, Edinburgh and Dublin Philosophical Magazine and Journal of Science, Sixth Series 2, 559–572.Google Scholar
  19. Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77 (2), 257– 285.Google Scholar
  20. Rao, C. R. (1955). Estimation and tests of significance in factor analysis. Psychometrika 20, 93–111.CrossRefMATHMathSciNetGoogle Scholar
  21. Rubin, D. B. and D. T. Thayer (1982). EM algorithms for ML factor analysis. Psychometrika 47 (1), 69–76.CrossRefMATHMathSciNetGoogle Scholar
  22. Tipping, M. E. and C. M. Bishop (1996). Hierarchical latent variable models for data visualization. Technical Report NCRG/96/028, Neural Computing Research Group, Aston University, Birmingham, UK. Submitted to IEEE PAMI.Google Scholar
  23. Tipping, M. E. and C. M. Bishop (1997a). Mixtures of principal component analysers. Technical Report NCRG/97/003, Neural Computing Research Group, Aston University, Birmingham, UK. Submitted to Neural Computation.Google Scholar
  24. Tipping, M. E. and C. M. Bishop (1997b). Probabilistic principal component analysis. Technical report, Neural Computing Research Group, Aston University, Birmingham, UK. Submitted to Journal of the Royal Statistical Society.Google Scholar

Copyright information

© Springer-Verlag London Limited 1998

Authors and Affiliations

  • Christopher M. Bishop
    • 1
  1. 1.Neural Computing Research Group, Dept. of Computer Science and Applied MathematicsAston UniversityBirminghamUK

Personalised recommendations