A common problem with feature selection is to establish how many features should be retained at least so that important information is not lost. We describe a method for choosing this number that makes use of Support Vector Machines. The method is based on controlling an angle by which the decision hyperplane is tilt due to feature selection.

Experiments were performed on three text datasets generated from a Wikipedia dump. Amount of retained information was estimated by classification accuracy. Even though the method is parametric, we show that, as opposed to other methods, once its parameter is chosen it can be applied to a number of similar problems (e.g. one value can be used for various datasets originating from Wikipedia). For a constant value of the parameter, dimensionality was reduced by from 78% to 90%, depending on the data set. Relative accuracy drop due to feature removal was less than 0.5% in those experiments.


feature selection SVM documents categorization 


  1. 1.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML 1997, pp. 412–420. Morgan Kaufmann Publishers Inc., San Francisco (1997)Google Scholar
  2. 2.
    Chen, J., Huang, H., Tian, S., Qu, Y.: Feature selection for text classification with naïve bayes. Expert Systems with Applications 36(3), 5432–5435 (2009)CrossRefGoogle Scholar
  3. 3.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1-3), 389–422 (2002)CrossRefzbMATHGoogle Scholar
  4. 4.
    Brank, J., Grobelnik, M.: Feature selection using linear support vector machines (2002)Google Scholar
  5. 5.
    Hearst, M.A., Dumais, S.T., Osman, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intelligent Systems and their Applications 13(4), 18–28 (1998)CrossRefGoogle Scholar
  6. 6.
    Neumann, J., Schnörr, C., Steidl, G.: Combined svm-based feature selection and classification. Machine Learning 61(1-3), 129–150 (2005)CrossRefzbMATHGoogle Scholar
  7. 7.
    Rzeniewicz, J.: Analysis methods for intercategorial links. Master’s thesis, Gdansk University of Technology (2013)Google Scholar
  8. 8.
    Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: Liblinear: A library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)zbMATHGoogle Scholar
  9. 9.
    Szymański, J.: Wikipedia Articles Representation with Matrix’u. In: Hota, C., Srimani, P.K. (eds.) ICDCIT 2013. LNCS, vol. 7753, pp. 500–510. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  10. 10.
    Balicki, J., Krawczyk, H., Rymko, Ł., Szymański, J.: Selection of Relevant Features for Text Classification with K-NN. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2013, Part II. LNCS (LNAI), vol. 7895, pp. 477–488. Springer, Heidelberg (2013)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Jacek Rzeniewicz
    • 1
  • Julian Szymański
    • 1
  1. 1.Department of Computer Systems ArchitectureGdańsk University of TechnologyPoland

Personalised recommendations