Abstract
Statistical clustering is the method for dividing the given samples by assumed distributions. In high dimensional problems, such as document or image clustering, the direct method is suffered from over-fitting and the curse of the dimensionality. In many cases, we firstly reduce the dimensionality, then apply the clustering algorithm. However these methods neglect the interaction among two processes. In this report, we propose the hierarchical joint distribution of Latent Dirichlet Allocation and Polya Mixture and give the parameter estimation algorithm by Gibbs sampling method. Some benchmarks show the effectiveness of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bishop, C.M.: Bayesian PCA. Advances in Neural Information Processing Systems 11, 382–388 (1999)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Teh, Y.W., Jordan, M.I., Beak, M.J., Blei, D.M.: Hierarchical Dirichlet processes. Journal of the American Statistical Association 101(476), 1566–1581 (2006)
Watanabe, K., Akaho, S., Okada, M.: Clustering on a subspace of exponential family using variational Bayes method. In: Proceedings of International Conference on Information Theory and Statistical Learning (2008)
Katahira, K., Matsumoto, N., Sugase-Miyamoto, Y., Okanoya, K., Okada, M.: Doubly Sparse Factor Models for Unifying Feature Transformation and Feature Selection. Journal of Physics: Conference Series (in press)
Griffiths, T., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences 101 (2004)
Sadamitsu, K., Mishina, T., Yamamoto, M.: Topic-based language models using Dirichlet mixtures. IEICE-D-II J88-D-II(9), 1771–1779 (2005)
Zhao, B., Wang, F., Zhang, C.: Efficient multiclass maximum margin clustering. In: ICML 2008: Proceedings of the 25th International Conference on Machine Learning (2008)
Li, Y.-F., Tsang, I.W., Kwok, J., Zhou, Z.-H.: Tighter and Convex Maximum Margin Clustering. JMLR W&CP 5, 344–351 (2009)
Watanabe, S.: Equations of states in singular statistical estimation. Neural Networks 23(1) (2010)
Lewis, D.D., Yang, Y., Rose, T., Li, F.: Rcv1: A new benchmark collection for text categorization research. JMLR 5, 361–397
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hosino, T. (2010). Bayesian Joint Optimization for Topic Model and Clustering. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds) Artificial Neural Networks – ICANN 2010. ICANN 2010. Lecture Notes in Computer Science, vol 6352. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15819-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-15819-3_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15818-6
Online ISBN: 978-3-642-15819-3
eBook Packages: Computer ScienceComputer Science (R0)