A Feature Set Decomposition Method for the Construction of Multi-classifier Systems Trained with High-Dimensional Data

  • Yoisel Campos
  • Roberto Estrada
  • Carlos Morell
  • Francesc J. Ferri
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8258)


Data mining for the discovery of novel, useful patterns, encounters obstacles when dealing with high-dimensional datasets, which have been documented as the “curse” of dimensionality. A strategy to deal with this issue is the decomposition of the input feature set to build a multi-classifier system. Standalone decomposition methods are rare and generally based on random selection. We propose a decomposition method which uses information theory tools to arrange input features into uncorrelated and relevant subsets. Experimental results show how this approach significantly outperforms three baseline decomposition methods, in terms of classification accuracy.


multi-classifier systems feature set decomposition information theory 


  1. 1.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)zbMATHGoogle Scholar
  2. 2.
    Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  3. 3.
    Ahn, H., et al.: Classiffication by ensembles from random partitions of high-dimensional data. Computational Statistics & Data Analysis 51, 6166–6179 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Hu, Q., et al.: Ensemble rough subspaces. Pattern Recognition 40, 3728–3739 (2007)CrossRefzbMATHGoogle Scholar
  5. 5.
    Fleuret, F.: Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research 5, 1531–1555 (2004)MathSciNetzbMATHGoogle Scholar
  6. 6.
    François, D.: High-dimensional data analysis: optimal metrics and feature selection. PhD thesis, Université Catholique de Louvain (2007)Google Scholar
  7. 7.
    Guyon, I., Gunn, S.R., Ben-Hur, A., Dror, G.: Design and analysis of the NIPS 2003 challenge. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction. STUDFUZZ, vol. 207, pp. 237–263. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 832–844 (1998)CrossRefGoogle Scholar
  9. 9.
    Liao, Y., Moody, J.: Constructing heterogeneous committees via input feature grouping. In: Advances in Neural Information Processing Systems, vol. 12, pp. 921–927 (2000)Google Scholar
  10. 10.
    Maimon, O., Rokach, L.: Decomposition methodology for knowledge discovery and data mining. World Scientific (2005)Google Scholar
  11. 11.
    Tahir, M.A., Smith, J.: Creating diverse nearest-neighbour ensembles using simultaneous metaheuristic feature selection. Pattern Recognition Letters 31(11), 1470–1480 (2010)CrossRefGoogle Scholar
  12. 12.
    Torkkola, K.: Information-Theoretic Methods. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction. STUDFUZZ, vol. 207, pp. 167–185. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  13. 13.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann Publishers (2011)Google Scholar
  14. 14.
    Wojnarski, M., et al.: RSCTC’2010 discovery challenge: Mining DNA microarray data for medical diagnosis and treatment. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS (LNAI), vol. 6086, pp. 4–19. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Machine Learning, ICML 2003, Washington, DC, USA, August 21-24, pp. 856–863 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Yoisel Campos
    • 1
  • Roberto Estrada
    • 1
  • Carlos Morell
    • 2
  • Francesc J. Ferri
    • 3
  1. 1.Univ. de Holguín “Oscar Lucero Moya”.HolguínCuba
  2. 2.Computer Science Dept.Univ. Central “Marta Abreu” de Las Villas.Santa ClaraCuba
  3. 3.Dept. d’InformàticaUniversitat de ValènciaValènciaSpain

Personalised recommendations