A Feature Set Decomposition Method for the Construction of Multi-classifier Systems Trained with High-Dimensional Data
Data mining for the discovery of novel, useful patterns, encounters obstacles when dealing with high-dimensional datasets, which have been documented as the “curse” of dimensionality. A strategy to deal with this issue is the decomposition of the input feature set to build a multi-classifier system. Standalone decomposition methods are rare and generally based on random selection. We propose a decomposition method which uses information theory tools to arrange input features into uncorrelated and relevant subsets. Experimental results show how this approach significantly outperforms three baseline decomposition methods, in terms of classification accuracy.
Keywordsmulti-classifier systems feature set decomposition information theory
- 6.François, D.: High-dimensional data analysis: optimal metrics and feature selection. PhD thesis, Université Catholique de Louvain (2007)Google Scholar
- 9.Liao, Y., Moody, J.: Constructing heterogeneous committees via input feature grouping. In: Advances in Neural Information Processing Systems, vol. 12, pp. 921–927 (2000)Google Scholar
- 10.Maimon, O., Rokach, L.: Decomposition methodology for knowledge discovery and data mining. World Scientific (2005)Google Scholar
- 13.Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann Publishers (2011)Google Scholar
- 15.Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Machine Learning, ICML 2003, Washington, DC, USA, August 21-24, pp. 856–863 (2003)Google Scholar