L2 Normalized Data Clustering Through the Dirichlet Process Mixture Model of von Mises Distributions with Localized Feature Selection
In this chapter, we propose a probabilistic model based-approach for clustering L2 normalized data. Our approach is based on the Dirichlet process mixture model of von Mises (VM) distributions. Since it assumes an infinite number of clusters (i.e., the mixture components), the Dirichlet process mixture model of VM distributions can also be considered as the infinite VM mixture model. Comparing with finite mixture model in which the number of mixture components have to be determined through extra efforts, the infinite mixture VM model is a nonparametric model such that the number of mixture components is assumed to be infinite initially and will be inferred automatically during the learning process. To improve clustering performance for high-dimensional data, a localized feature selection scheme is integrated into the infinite VM mixture model which can effectively detect irrelevant features based on the estimated feature saliencies. In order to learn the proposed infinite mixture model with localized feature selection, we develop an effective approach using variational inference that can estimate model parameters and feature saliencies with closed-form solutions. Our model-based clustering approach is validated through two challenging applications, namely topic novelty detection and unsupervised image categorization.
KeywordsClustering Spherical data von Mises distribution Mixture models Feature selection Novelty detection Image categorization
The completion of this work was supported by the National Natural Science Foundation of China (61876068), the Natural Science Foundation of Fujian Province (2018J01094), and the Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University (ZQNPY510).
- 4.Fan, W., Bouguila, N., Liu, X.: A hierarchical Dirichlet process mixture of GID distributions with feature selection for spatio-temporal video modeling and segmentation. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, pp. 2771–2775. IEEE, Piscataway (2017)Google Scholar
- 7.Amayri, O., Bouguila, N.: Infinite Langevin mixture modeling and feature selection. In: 2016 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2016, pp. 149–155. IEEE, Piscataway (2016)Google Scholar
- 8.Amayri, O., Bouguila, N.: RJMCMC learning for clustering and feature selection of l2-normalized vectors. In: International Conference on Control, Decision and Information Technologies, CoDIT 2016, pp. 269–274. IEEE, Piscataway (2016)Google Scholar
- 15.Attias, H.: A variational Bayes framework for graphical models. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 209–215 (1999)Google Scholar
- 18.Fan, W., Bouguila, N.: Nonparametric localized feature selection via a Dirichlet process mixture of generalized Dirichlet distributions. In: Neural Information Processing—19th International Conference, ICONIP 2012, pp. 25–33 (2012)Google Scholar
- 19.Fan, W., Bouguila, N., Ziou, D.: Unsupervised anomaly intrusion detection via localized Bayesian feature selection. In: 11th IEEE International Conference on Data Mining, ICDM 2011, pp. 1032–1037. IEEE, Piscataway (2011)Google Scholar
- 23.McCallum, A.K.: Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/~mccallum/bow (1996)
- 24.Nilsback, M.-E., Zisserman, A.: A visual vocabulary for flower classification. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 1447–1454. IEEE, Piscataway (2006)Google Scholar
- 25.Ke, Y., Sukthankar, R.: PCA-SIFT: A more distinctive representation for local image descriptors. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 506–513. IEEE, Piscataway (2004)Google Scholar
- 27.Elkan, C.: Using the triangle inequality to accelerate k-means. In: Proceedings of the Twentieth International Conference on International Conference on Machine Learning, pp. 147–153 (2003)Google Scholar