Abstract
Producing stable feature rankings is critical in many areas, such as in bioinformatics where the robustness of a list of ranked genes is crucial to interpretation by a domain expert. In this paper, we study Spearman’s rho as a measure of stability to training data perturbations - not just as a heuristic, but here proving that it is the natural measure of stability when using mean rank aggregation. We provide insights on the properties of this stability measure, allowing a useful interpretation of stability values - e.g. how close a stability value is to that of a purely random feature ranking process, and concepts such as the expected value of a stability estimator.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Available online at
http://www.cs.man.ac.uk/~nogueirs/files/IbPRIA2017-supplementary-material.pdf.
References
Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, P., Saeys, Y.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26, 392–398 (2010)
Berry, K.J., Mielke Jr., P.W., Johnston, J.E.: Permutation Statistical Methods: An Integrated Approach. Springer, Heidelberg (2016)
Boulesteix, A.L., Slawski, M.: Stability and aggregation of ranked gene lists. Brief. Bioinform. 10, 556–568 (2009)
Brown, G., Wyatt, J.L.: The use of the ambiguity decomposition in neural network ensemble learning methods. In: Fawcett, T., Mishra, N. (eds.) ICML (2003)
Brown, G., Wyatt, J.L., Tiňo, P.: Managing diversity in regression ensembles. J. Mach. Learn. Res. 6, 1621–1650 (2005)
Dessì, N., Pes, B.: Stability in biomarker discovery: does ensemble feature selection really help? In: Proceedings IEA/AIE 2015 (2015)
Dittman, D.J., Khoshgoftaar, T.M., Wald, R., Napolitano, A.: Classification performance of rank aggregation techniques for ensemble gene selection. In: FLAIRS Conference. AAAI Press (2013)
Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: Proceedings International Conference on World Wide Web (2001)
He, Z., Yu, W.: Stable feature selection for biomarker discovery. Comput. Biol. Chem. 34, 215–225 (2010)
Jurman, G., Merler, S., Barla, A., Paoli, S., Galea, A., Furlanello, C.: Algebraic stability indicators for ranked lists in molecular profiling. Bioinformatics 24, 258–264 (2008)
Jurman, G., Riccadonna, S., Visintainer, R., Furlanello, C.: Algebraic comparison of partial lists in bioinformatics. PLoS one 7, e36540 (2012)
Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12, 95–116 (2007)
Kuncheva, L.I.: A stability index for feature selection. In: Proceedings of Artificial Intelligence and Applications (2007)
Nogueira, S., Brown, G.: Measuring the stability of feature selection. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS, vol. 9852, pp. 442–457. Springer, Cham (2016). doi:10.1007/978-3-319-46227-1_28
Saeys, Y., Abeel, T., Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS, vol. 5212, pp. 313–325. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87481-2_21
Schmid, F., Schmidt, R.: Multivariate extensions of Spearman’s rho and related statistics. Stat. Probab. Lett. 77, 407–416 (2007)
Sculley, D.: Rank aggregation for similar items. In: Proceedings of the Seventh SIAM International Conference on Data Mining (2007)
Sechidis, K.: Hypothesis testing and feature selection in semi-supervised data. Ph.D. thesis, School of Computer Science, University Of Manchester, UK (2015)
Voorhees, E.M.: Evaluation by highly relevant documents. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2001. ACM (2001)
Wald, R., Khoshgoftaar, T.M., Dittman, D.J., Awada, W., Napolitano, A.: An extensive comparison of feature ranking aggregation techniques in bioinformatics. In: IRI. IEEE (2012)
Acknowledgements
The authors gratefully acknowledge the support of the EPSRC for the Manchester Centre for Doctoral Training in Computer Science (EP/I028099/1) and the LAMBDA project (EP/N035127/1).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Nogueira, S., Sechidis, K., Brown, G. (2017). On the Use of Spearman’s Rho to Measure the Stability of Feature Rankings. In: Alexandre, L., Salvador Sánchez, J., Rodrigues, J. (eds) Pattern Recognition and Image Analysis. IbPRIA 2017. Lecture Notes in Computer Science(), vol 10255. Springer, Cham. https://doi.org/10.1007/978-3-319-58838-4_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-58838-4_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58837-7
Online ISBN: 978-3-319-58838-4
eBook Packages: Computer ScienceComputer Science (R0)