Discrete Component Analysis

Buntine, Wray; Jakulin, Aleks

doi:10.1007/11752790_1

Wray Buntine²⁰ &
Aleks Jakulin²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3940))

Included in the following conference series:

International Statistical and Optimization Perspectives Workshop "Subspace, Latent Structure and Feature Selection"

4054 Accesses
33 Citations

Abstract

This article presents a unified theory for analysis of components in discrete data, and compares the methods with techniques such as independent component analysis, non-negative matrix factorisation and latent Dirichlet allocation. The main families of algorithms discussed are a variational approximation, Gibbs sampling, and Rao-Blackwellised Gibbs sampling. Applications are presented for voting records from the United States Senate for 2003, and for the Reuters-21578 newswire collection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Azzopardi, L., Girolami, M., van Risjbergen, K.: Investigating the relationship between language model perplexity and ir precision-recall measures. In: SIGIR 2003: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 369–370 (2003)
Google Scholar
Buntine, W., Jakulin, A.: Applying discrete PCA in data analysis. In: UAI-2004, Banff, Canada (2004)
Google Scholar
Bingham, E., Kabán, A., Girolami, M.: Topic identification in dynamical text by complexity pursuit. Neural Process. Lett. 17(1), 69–83 (2003)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Buntine, W.L., Perttu, S., Tuulos, V.: Using discrete PCA on web pages. In: Workshop on Statistical Approaches to Web Mining, SAWM 2004 (2004), At ECML 2004
Google Scholar
Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. John Wiley, Chichester (1994)
Book MATH Google Scholar
Buntine, W.L.: Variational extensions to EM and multinomial PCA. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS, vol. 2430, p. 23. Springer, Heidelberg (2002)
Chapter Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, Reading (1999)
Google Scholar
Canny, J.: GaP: a factor model for discrete data. In: SIGIR 2004, pp. 122–129 (2004)
Google Scholar
Casella, G., Berger, R.L.: Statistical Inference. Wadsworth & Brooks/Cole, Belmont (1990)
MATH Google Scholar
Clarke, B.S., Barron, A.R.: Jeffrey’s prior is asymptotically least favorable under entropy risk. Journal of Statistical Planning and Inference 41, 37–60 (1994)
Article MathSciNet MATH Google Scholar
Carlin, B.P., Chib, S.: Bayesian model choice via MCMC. Journal of the Royal Statistical Society B 57, 473–484 (1995)
MATH Google Scholar
Collins, M., Dasgupta, S., Schapire, R.E.: A generalization of principal component analysis to the exponential family. In: NIPS*13 (2001)
Google Scholar
Clinton, J.D., Jackman, S., Rivers, D.: The statistical analysis of roll call voting: A unified approach. American Political Science Review 98(2), 355–370 (2004)
Article Google Scholar
Casella, G., Robert, C.P.: Rao-Blackewellization of sampling schemes. Biometrika 83(1), 81–94 (1996)
Article MathSciNet MATH Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Article Google Scholar
de Leeuw, J.: Principal component analysis of binary data: Applications to roll-call-analysis. Technical Report 364, UCLA Department of Statistics (2003)
Google Scholar
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1994)
Google Scholar
Ghahramani, Z., Beal, M.J.: Propagation algorithms for variational Bayesian learning. In: NIPS, pp. 507–513 (2000)
Google Scholar
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis. Chapman & Hall, Boca Raton (1995)
MATH Google Scholar
Gaussier, E., Goutte, C.: Relation between PLSA and NMF and implications. In: SIGIR 2005: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 601–602. ACM Press, New York (2005)
Chapter Google Scholar
Griffiths, T.L., Steyvers, M.: A probabilistic approach to semantic representation. In: Proc. of the 24th Annual Conference of the Cognitive Science Society (2002)
Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. PNAS Colloquium (2004)
Google Scholar
Hofmann, T., Buhmann, J.M.: Pairwise data clustering by deterministic annealing. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(1), 1–14 (1997)
Article Google Scholar
Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons, Chichester (2001)
Book Google Scholar
Holland, P., Laskey, K.B., Leinhardt, S.: Stochastic blockmodels: Some first steps. Social Networks 5, 109–137 (1983)
Article MathSciNet Google Scholar
Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Netw. 13(4-5), 411–430 (2000)
Article Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. Research and Development in Information Retrieval, 50–57 (1999)
Google Scholar
Jakulin, A., Bratko, I.: Analyzing attribute dependencies. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS, vol. 2838, pp. 229–240. Springer, Heidelberg (2003)
Chapter Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1999)
Google Scholar
Lee, D., Seung, H.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)
Article Google Scholar
Lewis, D.D., Yand, Y., Rose, T.G., Li, F.: Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)
Google Scholar
Mardia, K.V., Kent, J.T., Bibby, J.M.: Multivariate Analysis. Academic Press, London (1979)
MATH Google Scholar
Minka, T., Lafferty, J.: Expectation-propagation for the generative aspect model. In: UAI-2002, Edmonton (2002)
Google Scholar
McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman and Hall, London (1989)
Book MATH Google Scholar
Poole, K.T.: Non-parametric unfolding of binary choice data. Political Analysis 8(3), 211–232 (2000)
Article Google Scholar
Pritchard, J.K., Stephens, M., Donnelly, P.J.: Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000)
Google Scholar
Pereira, F., Tishby, N., Lee, L.: Distributional clustering of English words. In: Proceedings of ACL 1993 (June 1993)
Google Scholar
Ross, S.M.: Introduction to Probability Models, 4th edn. Academic Press, London (1989)
MATH Google Scholar
Roweis, S.: EM algorithms for PCA and SPCA. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing Systems, vol. 10. The MIT Press, Cambridge (1998)
Google Scholar
Snijders, T.A.B., Nowicki, K.: Estimation and prediction for stochastic block models for graphs with latent block structure. Journal of Classification 14, 75–100 (1997)
Article MathSciNet MATH Google Scholar
Tipping, M.E., Bishop, C.M.: Probabilistic principal components analysis. J. Roy. Statistical Society B 61(3), 611–622 (1999)
Article MathSciNet MATH Google Scholar
Titterington, D.M.: Some aspects of latent structure analysis (In this volume.). In: Saunders, C., Grobelnik, M., Gunn, S., Shawe-Taylor, J. (eds.) SLSFS 2005. LNCS, vol. 3940, pp. 69–83. Springer, Heidelberg (2006)
Chapter Google Scholar
van der Heijden, P.G.M., Gilula, Z., van der Ark, L.A.: An extended study into the relationship between correspondence analysis and latent class analysis. Sociological Methodology 29, 147–186 (1999)
Article Google Scholar
Woodbury, M.A., Manton, K.G.: A new procedure for analysis of medical classification. Methods Inf. Med. 21, 210–220 (1982)
Google Scholar
Wang, X., Mohanty, N., McCallum, A.: Group and topic discovery from relations and text. In: The 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Workshop on Link Discovery: Issues, Approaches and Applications (LinkKDD 2005), pp. 28–35 (2005)
Google Scholar
Yu, K., Yu, S., Tresp, V.: Dirichlet enhanced latent semantic analysis. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Proc. of the 10th International Workshop on Artificial Intelligence and Statistics (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Helsinki Institute for Information Technology (HIIT), Dept. of Computer Science, University of Helsinki, PL 68, 00014, Finland
Wray Buntine
Department of Knowledge Technologies, Jozef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Aleks Jakulin

Authors

Wray Buntine
View author publications
You can also search for this author in PubMed Google Scholar
Aleks Jakulin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ISIS Research Group, University of Southampton, Southampton, U.K.
Craig Saunders
Dept. of Knowledge Technologies, Jozef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Marko Grobelnik
School of Electronics and Computer Science, University of Southampton, Building 1, Highfield Campus, SO17 1BJ, Southampton, UK
Steve Gunn
The Centre for Computational Statistics and Machine Learning Department of Computer Science, University College London, Gower St., WC1E 6BT, London, UK
John Shawe-Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Buntine, W., Jakulin, A. (2006). Discrete Component Analysis. In: Saunders, C., Grobelnik, M., Gunn, S., Shawe-Taylor, J. (eds) Subspace, Latent Structure and Feature Selection. SLSFS 2005. Lecture Notes in Computer Science, vol 3940. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11752790_1

Download citation

DOI: https://doi.org/10.1007/11752790_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34137-6
Online ISBN: 978-3-540-34138-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics