Abstract
In an era where accumulating data is easy and storing it inexpensive, feature selection plays a central role in helping to reduce the high-dimensionality of huge amounts of otherwise meaningless data. In this paper, we propose a graph-based method for feature selection that ranks features by identifying the most important ones into arbitrary set of cues. Mapping the problem on an affinity graph - where features are the nodes - the solution is given by assessing the importance of nodes through some indicators of centrality, in particular, the Eigenvector Centrality (EC). The gist of EC is to estimate the importance of a feature as a function of the importance of its neighbors. Ranking central nodes individuates candidate features, which turn out to be effective from a classification point of view, as proved by a thoroughly experimental section. Our approach has been tested on 7 diverse datasets from recent literature (e.g., biological data and object recognition, among others), and compared against filter, embedded and wrappers methods. The results are remarkable in terms of accuracy, stability and low execution time.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The FSLib is publicly available on File Exchange - MATLAB Central at: https://it.mathworks.com/matlabcentral/fileexchange/56937-feature-selection-library.
References
GINA digit recognition database. In: IEEE Conference International Joint Conference on Neural Networks (2007)
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96(12), 6745–6750 (1999)
Bamber, D.: The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J. Math. Psychol. 12(4), 387–415 (1975)
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)
Bólon-Canedo, V., Sánchez-Maroo, N., Alonso-Betanzos, A.: Recent advances and emerging challenges of feature selection in the context of big data. Knowl.-Based Syst. 86, 33–45 (2015)
Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987)
Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Conference International Conference on Machine Learning (ICML) (1998)
Duch, W., Wieczorek, T., Biesiada, J., Blachnik, M.: Comparison of feature ranking methods based on information entropy. In: IJCNN, vol. 2. IEEE (2004)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results (2007)
Garrison, W.L.: Connectivity of the interstate highway system. Pap. Reg. Sci. 6(1), 121–137 (1960)
Golub, T.R.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Grinblat, G.L., Izetta, J., Granitto, P.M.: SVM based feature selection: why are we using the dual? In: Conference Ibero-American Conference on AI (2010)
Gu, Q., Li, Z., Han, J.: Generalized fisher score for feature selection. In: Computing Research Repository (CoRR) (2012)
Guyon, I.: Feature Extraction: Foundations and Applications, vol. 207. Springer Science & Business Media, Berlin (2006)
Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the nips 2003 feature selection challenge. In: NIPS, pp. 545–552 (2004)
Guyon, I., Li, J., Mader, T., Pletscher, P.A., Schneider, G., Uhr, M.: Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark. PRL 28(12), 1438–1444 (2007)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. J. 46(1), 389–422 (2002)
Guzmán-Martínez, R., Alaiz-Rodríguez, R.: Feature selection stability assessment based on the Jensen-Shannon divergence. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS, vol. 6911, pp. 597–612. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23780-5_48
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems, vol. 18 (2005)
Kang, U., Papadimitriou, S., Sun, J., Tong, H.: Centralities in large networks: algorithms and observations. In: Proceedings of the 2011 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, pp. 119–130 (2011)
Kuncheva, L.I.: A stability index for feature selection. In: Proceedings of the 25th Conference on Proceedings of the 25th IASTED International Multi-Conference: Artificial Intelligence and Applications, AIAP 2007, pp. 390–395. ACTA Press, Anaheim (2007)
Lehoucq, R.B., Sorensen, D.C., Yang, C.: ARPACK Users’ Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods, vol. 6. SIAM, Philadelphia (1998)
Lerman, K., Ghosh, R., Kang, J.H.: Centrality metric for dynamic networks. In: Proceedings of the Eighth Workshop on Mining and Learning with Graphs, MLG 2010, pp. 70–77. ACM, New York (2010)
Liu, H., Motoda, H. (eds.): Computational Methods of Feature Selection. CRC Press, Boca Raton (2007)
Meyer, C.D. (ed.): Matrix Analysis and Applied Linear Algebra. Society for Industrial and Applied Mathematics, Philadelphia (2000)
Obertino, S., Roffo, G., Granziera, C., Menegaz, G.: Infinite feature selection on shore-based biomarkers reveals connectivity modulation after stroke. In: 2016 International Workshop on Pattern Recognition in Neuroimaging (PRNI), pp. 1–4, June 2016
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 27(8), 1226–1238 (2005)
Pitts, F.R.: A graph theoretic approach to historical geography. Prof. Geogr. 17(5), 15–20 (1965)
Rawat, A., Saha, S., Ghrera, S.P.: Time efficient ranking system on map reduce framework. In: 2015 Third International Conference on Image Information Processing (ICIIP), pp. 496–501 (2015)
Roffo, G., Melzi, S.: Online feature selection for visual tracking. In: International Conference the British Machine Vision Conference (BMVC), September 2016
Roffo, G., Melzi, S., Cristani, M.: Infinite feature selection. In: IEEE Conference International Conference on Computer Vision (ICCV) (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)
Wu, D.D., Deng, X., Li, Y.: Safety and emergency systems engineering mapreduce based betweenness approximation engineering in large scale graph. Syst. Eng. Procedia 5, 162–167 (2012)
Zaffalon, M., Hutter, M.: Robust feature selection using distributions of mutual information. In: Conference International Conference on Uncertainty in Artificial Intelligence (UAI) (2002)
Zhang, Z., Hancock, E.R.: A graph-based approach to feature selection. In: Jiang, X., Ferrer, M., Torsello, A. (eds.) GbRPR 2011. LNCS, vol. 6658, pp. 205–214. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20844-7_21
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Roffo, G., Melzi, S. (2017). Ranking to Learn:. In: Appice, A., Ceci, M., Loglisci, C., Masciari, E., Raś, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2016. Lecture Notes in Computer Science(), vol 10312. Springer, Cham. https://doi.org/10.1007/978-3-319-61461-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-61461-8_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61460-1
Online ISBN: 978-3-319-61461-8
eBook Packages: Computer ScienceComputer Science (R0)