Abstract
In clustering, most feature selection approaches account for all the features of the data to identify a single common feature subset contributing to the discovery of the interesting clusters. However, many data can comprise multiple feature subsets, where each feature subset corresponds to the meaningful clusters differently. In this paper, we attempt to reveal a feature partition consisting of multiple non-overlapped feature blocks that each one fits a finite mixture model. To find the desired feature partition, we used a local search algorithm based on a Simulated Annealing technique. During the process of searching for the optimal feature partition, reutilization of the previous estimation results has been adopted to reduce computational cost.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Akaike, H.: A new look at the statistical model identification. IEEE Transactions on Automatic Control 19(6), 716–723 (1974)
Booth, J.G., Casella, G., Hobert, J.P.: Clustering using objective functions and stochastic search. Journal of the Royal Statistical Society B 70(1), 119–139 (2008)
Constantinopoulos, C., Titsias, M.K., Likas, A.: Bayesian feature and model selection for Gaussian mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 28(6), 1013–1018 (2006)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum-likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39, 1–38 (1977)
Han, J., Kamber, M.: Data mining: concepts and techniques. Morgan Kaufmann Publishers, San Francisco (2001)
Jain, A.K., Zongker, D.E.: Feature selection: Evaluation, application, and small sample performance. IEEE Trans. Pattern Anal. Mach. Intell. 19(2), 153–158 (1997)
Green, P.J.: Reversible jump Markov Chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732 (1995)
Kirkpatrick, S., Gelatt Jr., C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220, 671–680 (1983)
Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Mathematical Statistics 22(1), 79–86 (1951)
Law, M.H.C., Figueiredo, M.A.T., Jain, A.K.: Simultaneous feature selection and clustering using mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1154–1166 (2004)
Liu, T., Liu, S., Chen, Z., Ma, W.-Y.: An evaluation on feature selection for text clustering. In: Fawcett, T., Mishra, N. (eds.) ICML. AAAI Press, Menlo Park (2003)
Liu, H., Motoda, H.: Computational Methods for Feature Selection. Chapman & Hall/CRC, Boca Raton (2007)
Luss, R., dAspremont, A.: Clustering and feature selection using sparse principal component analysis. CoRR abs/0707.0701 (2007)
Neal, R.: Markov chain sampling methods for Dirichlet process mixture models. Technical Report 9815, Department of statistics, University of Toronto (1998)
Roberts, S.J., Holmes, C., Denison, D.: Minimum-entropy data partitioning using reversible jump markov chain monte carlo. IEEE Trans. Pattern Anal. Mach. Intell. 23(8), 909–914 (2001)
Rota, G.-C.: The Number of Partitions of a Set. American Mathematical Monthly 71(5), 498–504 (1964)
Sahami, M.: Using Machine Learning to Improve Information Access. Ph.D. thesis, Stanford University, CA (1998)
Ueda, N., Nakano, R.: Deterministic annealing EM algorithm. Neural Networks 11(2), 271–282 (1998)
Xu, R., Wunsch II, D.: Clustering (IEEE Press Series on Computational Intelligence). Wiley-IEEE Press (2009)
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository (2007), http://archive.ics.uci.edu/ml/index.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Namkoong, Y., Joo, Y., Dankel, D.D. (2010). Feature Subset-Wise Mixture Model-Based Clustering via Local Search Algorithm. In: Farzindar, A., Kešelj, V. (eds) Advances in Artificial Intelligence. Canadian AI 2010. Lecture Notes in Computer Science(), vol 6085. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13059-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-13059-5_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13058-8
Online ISBN: 978-3-642-13059-5
eBook Packages: Computer ScienceComputer Science (R0)