Journal of the Operational Research Society

, Volume 68, Issue 9, pp 1117–1130 | Cite as

Variable selection methods for multi-class classification using signomial function

  • Kyoungmi Hwang
  • Kyungsik LeeEmail author
  • Sungsoo ParkEmail author


We develop several variable selection methods using signomial function to select relevant variables for multi-class classification by taking all classes into consideration. We introduce a \(\ell _{1}\)-norm regularization function to measure the number of selected variables and two adaptive parameters to apply different importance weights for different variables according to their relative importance. The proposed methods select variables suitable for predicting the output and automatically determine the number of variables to be selected. Then, with the selected variables, they naturally obtain the resulting classifiers without an additional classification process. The classifiers obtained by the proposed methods yield competitive or better classification accuracy levels than those by the existing methods.


variable selection multi-class classification embedded method signomial function 



This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2013-025297).


  1. Anand R, Mehrotra K, Mohan CK and Ranka S (1995). Efficient classification for multiclass problems using modular neural networks. IEEE Transactions on Neural Networks 6(1):117–124.CrossRefGoogle Scholar
  2. Bache K and Lichman M (2013). University of California, Irvine (UCI) machine learning repository.
  3. Bay SD (1998). Combining nearest neighbor classifiers through multiple feature subsets, in Proceedings of the 15th International Conference on Machine Learning, ICML ’98, Morgan Kaufmann Publishers, Madison, WI, USA, pp. 37–45.Google Scholar
  4. Bermejo P, de la Ossa L, Gamez JA and Puerta JM (2012). Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking. Knowledge-Based Systems 25(1):35–44.CrossRefGoogle Scholar
  5. Bertsimas D and Tsitsiklis JN (1997). Introduction to linear optimization. Athena Scientific: Belmont, MAMSC, USA.Google Scholar
  6. Bi J, Bennett K, Embrechts M, Breneman C and Song M (2003). Dimensionality reduction via sparse support vector machines. Journal of Machine Learning Research 3(Mar):1229–1243.Google Scholar
  7. Bolon-Canedo V, Sanchez-Marono N and Alonso-Betanzos A (2012). An ensemble of filters and classifiers for microarray data classification. Pattern Recognition 45(1):531–539.CrossRefGoogle Scholar
  8. Bradley P and Mangasarian O (1998). Feature selection via concave minimization and support vector machines, in Proceedings of the 15th International Conference on Machine Learning, ICML ’98, Morgan Kaufmann Publishers, Madison, WI, USA, pp. 82–90.Google Scholar
  9. Breiman L, Friedman J, Stone CJ and Olshen RA (1984). Classification and regression trees. Wadsworth and Brooks: Monterey, CA.Google Scholar
  10. Canu S, Grandvalet Y, Guigue V and Rakotomamonjy A (2005). SVM and kernel methods matlab toolbox. Perception Systemes et Information: INSA de Rouen, Rouen, France.Google Scholar
  11. Chai H and Domeniconi C (2004). An evaluation of gene selection methods for multi-class microarray data classification. in Proceedings of the 2nd European Workshop on Data Mining and Text Mining in Bioinformatics (in conjunction with ECML/PKDD), Pisa, Italy, pp. 3–10.Google Scholar
  12. Chapelle O and Keerthi SS (2008). Multi-class feature selection with support vector machines. in Proceedings of the American statistical association, ASA, Denver, CO, USA.Google Scholar
  13. Chapelle O, Vapnik V, Bousquet O and Mukherjee S (2002). Choosing multiple parameters for support vector machines. Machine Learning 46(1):131–159.CrossRefGoogle Scholar
  14. Chen X, Zeng X and van Alphen D (2006). Multi-class feature selection for texture classification. Pattern Recognition Letters 27(14):1685–1691.CrossRefGoogle Scholar
  15. Clark P and Boswell R (1991). Rule induction with CN2: Some recent improvements. in Proceedings of the European Working Session on Machine Learning, EWSL ’91, Springer-Verlag, Porto, Portugal, pp. 151–163.CrossRefGoogle Scholar
  16. Cover T and Hart P (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1):21–27.CrossRefGoogle Scholar
  17. Crammer K and Singer Y (2002). On the algorithmic implementation of multiclass kernel-based vector machines. The Journal of Machine Learning Research 2(Dec):265–292.Google Scholar
  18. Cun YL, Denker JS and Solla SA (1989). Optimal brain damage. in Proceedings of the 2nd Annual Conference on Neural Information Processing Systems, NIPS ’89, Morgan Kaufmann Publishers, Denver, CO, USA, pp. 598–605.Google Scholar
  19. Debnath R, Takahide N and Takahashi H (2004). A decision based one-against-one method for multi-class support vector machine. Pattern Analysis and Applications 7(2):164–175.CrossRefGoogle Scholar
  20. Dhillon IS, Mallela S and Kumar R (2003). A divisive information-theoretic feature clustering algorithm for text classification. Journal of Machine Learning Research. 3(Mar):1265–1287.Google Scholar
  21. Duan KB, Rajapakse JC and Nguyen MN (2007). One-versus-one and one-versus-all multiclass SVM-RFE for gene selection in cancer classification. in Proceedings of the 5th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, EvoBIO ’07, Springer-Verlag, Valencia, Spain, pp. 47–56.CrossRefGoogle Scholar
  22. El Akadi A, Amine, A, El Ouardighi, A and Aboutajdine, D (2011). A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowledge and Information Systems 26(3):487–500.CrossRefGoogle Scholar
  23. Fayyad U and Irani K (1993). Multi-interval discretization of continuous-valued attributes for classification learning. in Proceedings of the 13th International Joint Conference on Artificial Intelligence, IJCAI ’93, Morgan Kaufmann, Chambery, France, pp. 1022–1029.Google Scholar
  24. Forman G (2003). An extensive empirical study of feature selection metrics for text classification. The Journal of Machine Learning Research 3(Mar):1289–1305.Google Scholar
  25. Forman G (2004). A pitfall and solution in multi-class feature selection for text classification. in Proceedings of the 21st International Conference on Machine Learning, ICML ’04, ACM, Banff, Alberta, Canada, pp. 38–45.Google Scholar
  26. Freund Y and Schapire RE (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1):119–139.CrossRefGoogle Scholar
  27. Friedman, J. H (1996). Another approach to polychotomous classification (Vol. 56), Technical report, Department of Statistics, Stanford University, Stanford, CA, USA.Google Scholar
  28. Fung GM and Mangasarian OL (2004). A feature selection Newton method for support vector machine classification. Computational Optimization and Applications 28(2):185–202.CrossRefGoogle Scholar
  29. Grandvalet Y and Canu S (2002). Adaptive scaling for feature selection in SVMs. in Proceedings of the 15th Annual Conference on Neural Information Processing Systems, NIPS ’02, MIT Press, Vancouver, BC, Canada, pp. 553–560.Google Scholar
  30. Gutlein M, Frank E, Hall M and Karwath A (2009). Large-scale attribute selection using wrappers. in Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM ’09, IEEE, Nashville, TN, USA, pp. 332–339.CrossRefGoogle Scholar
  31. Guyon I and Elisseeff A (2003). An introduction to variable and feature selection. Journal of Machine Learning Research 3(Mar):1157–1182.Google Scholar
  32. Guyon I, Weston J, Barnhill S and Vapnik V (2002). Gene selection for cancer classification using support vector machines. Machine Learning 46(1):389–422.CrossRefGoogle Scholar
  33. Hastie T and Tibshirani R (1997). Classification by pairwise coupling. in Proceedings of the 10th Annual Conference on Neural Information Processing Systems, NIPS ’97, MIT Press, Denver, CO, USA, pp. 507–513.Google Scholar
  34. Hermes L and Buhmann JM (2000). Feature selection for support vector machines. in Proceedings of the 15th International Conference on Pattern Recognition, ICPR ’00, IEEE Computer Society, Barcelona, Spain, pp. 716–719.Google Scholar
  35. Hsu CW and Lin CJ (2002). A simple decomposition method for support vector machines. Machine Learning 46(1):291–314.CrossRefGoogle Scholar
  36. Hsu CW and Lin CJ (2012). BSVM: A SVM library for the solution of large classification and regression problems.
  37. Huang K, Zheng D, King I and Lyu MR (2009). Arbitrary norm support vector machines. Neural Computation 21(2):560–582.CrossRefGoogle Scholar
  38. Hullermeier E and Vanderlooy S (2010). Combining predictions in pairwise classification: An optimal adaptive voting strategy and its relation to weighted voting. Pattern Recognition 43(1):128–142.CrossRefGoogle Scholar
  39. Hwang K, Lee K, Lee C and Park S (2013). Embedded variable selection method using signomial classification. Technical Report 2013-03, Department of Industrial and Systems Engineering, KAIST, Daejeon, Republic of Korea.Google Scholar
  40. Hwang K, Lee K, Lee C and Park S (2015). Multi-class classification using a signomial function. Journal of the Operational Research Society 66(3):434-449.CrossRefGoogle Scholar
  41. Ihaka R and Gentleman R (1996). R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics 5:299–314.Google Scholar
  42. Jebara T and Jaakkola T (2000). Feature selection and dualities in maximum entropy discrimination. in Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, UAI ’00, Morgan Kaufmann Publishers, Stanford, CA, USA, pp. 291–300.Google Scholar
  43. Kohavi R and John GH (1997). Wrappers for feature subset selection. Artificial Intelligence 97(1–2):273–324.CrossRefGoogle Scholar
  44. Kohavi R and Sommerfield D (1995). Feature subset selection using the wrapper method: Overfitting and dynamic search space topology. in Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining, KDD ’95, AAAI Press, Montreal, QC, Canada, pp. 192–197.Google Scholar
  45. Lal TN, Chapelle O, Weston J and Elisseeff A (2006). Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing), Vol. 207, Springer, Berlin, Germany, chapter 5. Embedded methods, pp. 137–165.Google Scholar
  46. Lee K, Kim N and Jeong MK (2014). The sparse signomial classification and regression model. Annals of Operations Research 216(1):257–286.CrossRefGoogle Scholar
  47. Li JT and Jia YM (2010). Huberized multiclass support vector machine for microarray classification. Acta Automatica Sinica 36(3):399–405.Google Scholar
  48. Li T, Zhang C and Ogihara M (2004). A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15):2429–2437.CrossRefGoogle Scholar
  49. Liu J, Ranka S and Kahveci T (2008). Classification and feature selection algorithms for multi-class CGH data. Bioinformatics 24(13):i86–i95.CrossRefGoogle Scholar
  50. Maldonado S and Weber R (2009). A wrapper method for feature selection using support vector machines. Information Sciences 179(13):2208–2217.CrossRefGoogle Scholar
  51. Maldonado S, Weber R and Basak J (2011). Simultaneous feature selection and classification using kernel-penalized support vector machines. Information Sciences 181(1):115–128.CrossRefGoogle Scholar
  52. Mangasarian OL (1999). Generalized support vector machines. Advances in Neural Information Processing Systems:135–146.Google Scholar
  53. Mangasarian OL (2006). Exact 1-norm support vector machines via unconstrained convex differentiable minimization. The Journal of Machine Learning Research 7(Jul):1517–1530.Google Scholar
  54. Matlab (2010). Matlab statistics toolbox.
  55. Michie D, Spiegelhalter DJ and Taylor CC (1994). Statlog collection. /statlog/
  56. Peng Y, Wu Z and Jiang J (2010). A novel feature selection approach for biomedical data classification. Journal of Biomedical Informatics 43(1):15–23.CrossRefGoogle Scholar
  57. Perkins S, Lacker K and Theiler J (2003). Grafting: fast, incremental feature selection by gradient descent in function space. Journal of Machine Learning Research 3(Mar):1333–1356.Google Scholar
  58. Pudil P, Novovicova J and Kittler J (1994). Floating search methods in feature selection. Pattern Recognition Letters 15(11):1119–1125.CrossRefGoogle Scholar
  59. Quinlan JR (1986). Induction of decision trees. Machine Learning 1(1):81–106.Google Scholar
  60. Quinlan JR (1993). C4.5: Programs for machine learning (morgan kaufmann series in machine learning), Morgan Kaufmann, San Francisco, CA, USA.Google Scholar
  61. Rajapakse JC and Mundra PA (2013). Multiclass gene selection using pareto-fronts. IEEE/ACM Transactions on Computational Biology and Bioinformatics 10(1):87–97.CrossRefGoogle Scholar
  62. Rakotomamonjy A (2003). Variable selection using SVM based criteria. Journal of Machine Learning Research 3(Mar):1357–1370.Google Scholar
  63. Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M Latulippe E, Mesirov JP et al (2001). Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences of the United States of America 98(26):15149–15154.CrossRefGoogle Scholar
  64. Rivals I and Personnaz L (2003). MLPs (mono layer polynomials and multi layer perceptrons) for nonlinear modeling. Journal of Machine Learning Research 3(Mar):1383–1398.Google Scholar
  65. Ruiz R, Riquelme JC and Aguilar-Ruiz JS (2006). Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recognition 39(12):2383–2392.CrossRefGoogle Scholar
  66. Shieh M and Yang C (2008). Multiclass SVM-RFE for product form feature selection. Expert Systems with Applications 35(1–2):531–541.CrossRefGoogle Scholar
  67. Somol P, Pudil P and Kittler J (2004). Fast branch & bound algorithms for optimal feature selection. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(7):900–912.CrossRefGoogle Scholar
  68. Stoppiglia H, Dreyfus G, Dubois R and Oussar Y (2003). Ranking a random feature for variable and feature selection. Journal of Machine Learning Research 3(Mar):1399–1414.Google Scholar
  69. Tipping ME (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1(Jun):211–244.Google Scholar
  70. Torkkola K (2003). Feature extraction by non parametric mutual information maximization. Journal of Machine Learning Research 3(Mar):1415–1438.Google Scholar
  71. Veenman CJ and Bolck A (2011). A sparse nearest mean classifier for high dimensional multi-class problems. Pattern Recognition Letters 32(6):854–859.CrossRefGoogle Scholar
  72. Wang L and Shen X (2006). Multi-category support vector machines, feature selection and solution path. Statistica Sinica 16(2):617–633.Google Scholar
  73. Wang L and Shen X (2007a). On \(\ell _{1}\)-norm multiclass support vector machines. Journal of the American Statistical Association 102(478):583–594.CrossRefGoogle Scholar
  74. Wang L and Shen X (2007b). On \(\ell _{1}\)-norm multiclass support vector machines: Methodology and theory. Journal of the American Statistical Association 102(478):583–594.CrossRefGoogle Scholar
  75. Wang L, Zhu J and Zou H (2006). The doubly regularized support vector machine. Statistica Sinica 16(2):589–615.Google Scholar
  76. Wang L, Zhu J and Zou H (2008). Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24(3):412–419.CrossRefGoogle Scholar
  77. Weston J, Elisseeff A, Scholkopf B and Tipping M (2003). Use of the zero-norm with linear models and kernel methods. Journal of Machine Learning Research 3(Mar):1439–1461.Google Scholar
  78. Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T and Vapnik V (2000). Feature selection for svms. in Proceedings of the 13th Annual Conference on Neural Information Processing Systems, NIPS ’00, MIT Press, Denver, CO, USA:563–532.Google Scholar
  79. Weston J and Watkins C (1999). Support vector machines for multi-class pattern recognition. in Proceedings of the 7th European Symposium on Artificial Neural Networks, ESANN ’99, Citeseer, Bruges, Belgium:219–224.Google Scholar
  80. Xpress (2015).
  81. Yang J and Honavar V (1998). Feature subset selection using a genetic algorithm. in Feature extraction, construction and selection, Kluwer Academic Publishers, Norwell, MA, USA: 117–136.CrossRefGoogle Scholar
  82. Yang Y and Pedersen JO (1997). A comparative study on feature selection in text categorization. in Proceedings of the 14th International Conference on Machine Learning, ICML ’97, Morgan Kaufmann Publishers, Nashville, TN, USA, pp. 412–420.Google Scholar
  83. Yang Y and Webb GI (2001). Proportional k-interval discretization for naive-Bayes classifiers. in Proceedings of the 12th European Conference on Machine learning, ECML ’01, Springer, Freiburg, Germany, pp. 564–575.Google Scholar
  84. You M and Li GZ (2011). Feature selection for multi-class problems by using pairwise-class and all-class techniques. International Journal of General Systems 40(4):381–394.CrossRefGoogle Scholar
  85. Zhang HH, Liu Y, Wu Y and Zhu J (2008). Variable selection for the multicategory SVM via adaptive sup-norm regularization. Electronic Journal of Statistics 2:149–167.CrossRefGoogle Scholar
  86. Zhang Y, Ding C and Li T (2008). Gene selection algorithm by combining reliefF and mRMR. BMC Genomics 9(2), p. S27.CrossRefGoogle Scholar
  87. Zhao Y and Yand Z (2010). Improving MSVM-RFE for multiclass gene selection. in B. S. L.-Y. W. Luonan Chen, Xiang-Sun Zhang and Y. Wang, eds, ‘Proceedings of the 4th International Conference on Computational Systems Biology, ISB ’10, World-publishing-corporation, Suzhou, China: 43–50.Google Scholar
  88. Zhou W, Zhang L and Jiao L (2002). Linear programming support vector machines. Pattern Recognition 35(12):2927–2936.CrossRefGoogle Scholar
  89. Zhou X and Tuck DP (2007). MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics 23(9):1106–1114.CrossRefGoogle Scholar
  90. Zhu J, Rosset S, Hastie T and Tibshirani R (2003). 1-norm support vector machines. in ‘Proceedings of the 16th Annual Conference on Neural Information Processing Systems, NIPS ’03, MIT Press, Vancouver and Whistler, BC, Canada, pp. 49–56.Google Scholar
  91. Zou H (2007). An improved 1-norm SVM for simultaneous classification and variable selection. Journal of Machine Learning Research-Proceedings Track 2:675–681.Google Scholar
  92. Zou H and Hastie T (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(2):301–320.Google Scholar

Copyright information

© The Operational Research Society 2016

Authors and Affiliations

  1. 1.Test and Package Automation Group, Giheung Hwaseong ComplexSamsung ElectronicsAsan-siRepublic of Korea
  2. 2.Department of Industrial Engineering and Institute for Industrial Systems InnovationSeoul National UniversitySeoulRepublic of Korea
  3. 3.Department of Industrial and Systems EngineeringKAISTDaejeonRepublic of Korea

Personalised recommendations