Latent Variables, Topographic Mappings and Data Visualization
Most pattern recognition tasks, such as regression, classification and novelty detection, can be viewed in terms of probability density estimation. A powerful approach to probabilistic modelling is to represent the observed variables in terms of a number of hidden, or latent, variables. One well-known example of a hidden variable model is the mixture distribution in which the hidden variable is the discrete component label. In the case of continuous latent variables we obtain models such as factor analysis. In this paper we provide an overview of latent variable models, and we show how a particular form of linear latent variable model can be used to provide a probabilistic formulation of the well-known technique of principal components analysis (PCA). By extending this technique to mixtures, and hierarchical mixtures, of probabilistic PCA models we are led to a powerful interactive algorithm for data visualization. We also show how the probabilistic PCA approach can be generalized to non-linear latent variable models leading to the Generative Topographic Mapping algorithm (GTM). Finally, we show how GTM can itself be extended to model temporal data.
Unable to display preview. Download preview PDF.
- Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University Press.Google Scholar
- Bishop, C. M., G. E. Hinton, and I. G. D. Strachan (1997). GTM through time. Accepted for publication in Proceedings IEE Fifth International Conference on Artificial Neural Networks, Cambridge, U.K.Google Scholar
- Bishop, C. M., M. Svensen, and C. K. I. Williams (1996). Magnification factors for the GTM algorithm. To appear in Proceedings Fifth IEE International Conference on Artificial Neural Networks.Google Scholar
- Bishop, C. M., M. Svensen, and C. K. I. Williams (1997). GTM: the generative topographic mapping. Accepted for publication in Neural Computation. Available as NCRG/96/015 from http: //www. ncrg. aston. ac. uk/.Google Scholar
- Hinton, G. E., C. K. I. Williams, and M. D. Revow (1992). Adaptive elastic models for hand-printed character recognition. In J. E. Moody, S. J. Hanson, and R. P. Lippmann (Eds.), Advances in Neural Information Processing Systems, Volume 4, pp. 512–519. Morgan KaufFmann.Google Scholar
- Kohonen, T. (1995). Self-Organizing Maps. Berlin: Springer-Verlag.Google Scholar
- Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. The London, Edinburgh and Dublin Philosophical Magazine and Journal of Science, Sixth Series 2, 559–572.Google Scholar
- Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77 (2), 257– 285.Google Scholar
- Tipping, M. E. and C. M. Bishop (1996). Hierarchical latent variable models for data visualization. Technical Report NCRG/96/028, Neural Computing Research Group, Aston University, Birmingham, UK. Submitted to IEEE PAMI.Google Scholar
- Tipping, M. E. and C. M. Bishop (1997a). Mixtures of principal component analysers. Technical Report NCRG/97/003, Neural Computing Research Group, Aston University, Birmingham, UK. Submitted to Neural Computation.Google Scholar
- Tipping, M. E. and C. M. Bishop (1997b). Probabilistic principal component analysis. Technical report, Neural Computing Research Group, Aston University, Birmingham, UK. Submitted to Journal of the Royal Statistical Society.Google Scholar