Ensemble Feature Selection Based on the Contextual Merit

  • Seppo Puuronen
  • Iryna Skrypnyk
  • Alexey Tsymbal
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2114)


Recent research has proved the benefits of using ensembles of classifiers for classification problems. Ensembles constructed by machine learning methods manipulating the training set are used to create diverse sets of accurate classifiers. Different feature selection techniques based on applying different heuristics for generating base classifiers can be adjusted to specific domain characteristics. In this paper we consider and experiment with the contextual feature merit measure as a feature selection heuristic. We use the diversity of an ensemble as evaluation function in our new algorithm with a refinement cycle. We have evaluated our algorithm on seven data sets from UCI. The experimental results show that for all these data sets ensemble feature selection based on the contextual merit and suitable starting amount of features produces an ensemble which with weighted voting never produces smaller accuracy than C4.5 alone with all the features.


Feature Selection Feature Subset Base Classifier Heuristic Rule Weighted Vote 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Apte, C., Hong, S.J., Hosking, J.R.M., Lepre, J., Pednault, E.P.D., Rosen, B.K.: Decomposition of Heterogeneous Classification Problems. Advances in Intelligent Data Analysis, Springer-Verlag, London (1997) 17–28.CrossRefGoogle Scholar
  2. 2.
    Batitti, R., Colla, A.M.: Democracy in Neural Nets: Voting Schemes for Classification. Neural Networks, Vol. 7,No. 4 (1994) 691–707.CrossRefGoogle Scholar
  3. 3.
    Brieman, L., Friedman, J., Olshen, R., Stone, C.: Classification and regression trees. Wadsworth International Group, Belmont, California (1984).Google Scholar
  4. 4.
    Cost, S., Salzberg, S.: A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning, Vol. 10,No. 1 (1993) 57–78.Google Scholar
  5. 5.
    Dietterich, T. Machine Learning research: Four Current Directions. Artificial Intelligence, Vol. 18,No. 4 (1997) 97–136.Google Scholar
  6. 6.
    Hansen, L., Salamon, P.: Neural Network Ensembles. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 12 (1990) 993–1001.CrossRefGoogle Scholar
  7. 7.
    Hong, S.J.: Use of contextual information for feature ranking and discretization. IEEE Transactions on knowledge and Data Engineering, Vol. 9,No. 5) (1997) 718–730.CrossRefGoogle Scholar
  8. 8.
    John, G.H.: Enhancements to the Data Mining Process, PhD Thesis, Computer Science Department, School of Engineering, Stanford University (1997).Google Scholar
  9. 9.
    Kohavi R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence Journal, Special Issue on Relevance edited by R. Greiner, J. Pearl and D. Subramanian.Google Scholar
  10. 10.
    Kohavi, R., John, G.H.: The Wrapper Approach. In: (eds.) H. Liu and H. Motoda, Feature Selection for Knowledge Discovery in Databases, Springer-Verlag (1998).Google Scholar
  11. 11.
    Kohavi, R., Sommerfield, D., Dougherty, J.: Data Mining Using MLC++: A Machine Learning Library in C++. Tools with Artificial Intelligence, IEEE CS Press (1996) 234–245.Google Scholar
  12. 12.
    Merz, C.J., Murphy, P.M.: UCI Repository of Machine Learning Datasets]. Dept of Information and CS, Un-ty of California, Irvine, CA (1998).
  13. 13.
    Opitz, D. Feature Selection for Ensembles. In: 16th National Conf. on Artificial Intelligence (AAAI), Orlando, Florida (1999) 379–384.Google Scholar
  14. 14.
    Opitz, D., Maclin, R.: Popular Ensemble Methods: An Empirical Study. Artificial Intelligent Research, Vol. 11 (1999), 169–198.zbMATHGoogle Scholar
  15. 15.
    Opitz, D., Shavlik, J.: Generating accurate and diverse members of neural network ensemble. Advances in Neural Information Processing Systems, Vol. 8 (1996) 881–887.Google Scholar
  16. 16.
    Oza, N., Tumer, K.: Dimensionality Reduction Through Classifier Ensembles. Tech. Rep. NASA-ARC-IC-1999-126.Google Scholar
  17. 17.
    Prodromidis, A.L., Stolfo, S.J., Chan P.K.: Puning Classifiers in a Distributed Meta-Learning System. In: Proc. of 1st National Conference on New Information Technologies, (1998) 151–160.Google Scholar
  18. 18.
    Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann, San Mateo, California (1993).Google Scholar
  19. 19.
    Shapire, R.E., Freud, Y., Bartlett, P., Lee, W.S.: Boosting the Margin: A New Explanation of the Effectiveness of the Voting Methods. The Annals of Statistics, Vol. 25,No. 5 (1998), 1651–1686.Google Scholar
  20. 20.
    Shapire, R.E.: A Brief Introduction to Boosting. In: Proceedings of 16th International Joint Conference on Artificial Intelligence (1999).Google Scholar
  21. 21.
    Skrypnyk, I., Puuronen, S.: Ensembles of Classifiers based on Contextual Features. In Proceedings of 4th International Conference “New Information Technologies” (NITe’2000), Minsk, Belarus, Dec. (2000) (to appear).Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Seppo Puuronen
    • 1
  • Iryna Skrypnyk
    • 1
  • Alexey Tsymbal
    • 1
  1. 1.Department of Computer Science and Information SystemsUniversity of JyväskyläJyväskyläFinland

Personalised recommendations