Data Mining and Knowledge Discovery

, Volume 33, Issue 4, pp 964–994 | Cite as

Algorithmic cache of sorted tables for feature selection

Speeding up methods based on consistency and information theory measures
  • Antonio Arauzo-Azofra
  • Alfonso Jiménez-Vílchez
  • José Molina-Baena
  • María Luque-RodriguezEmail author


Feature selection is a mechanism used in Machine Learning to reduce the complexity and improve the speed of the learning process by using a subset of features from the data set. There are several measures which are used to assign a score to a subset of features and, therefore, are able to compare them and decide which one is the best. The bottle neck of consistence measures is having the information of the different examples available to check their class by groups. To handle it, this paper proposes the concept of an algorithmic cache, which stores sorted tables to speed up the access to example information. The work carries out an empirical study using 34 real-world data sets and four representative search strategies combined with different table caching strategies and three sorting methods. The experiments calculate four different consistency and one information measures, showing that the proposed sorted tables cache reduces computation time and it is competitive with hash table structures.


Feature selection Attribute selection Consistency measures Information theory Data reduction Algorithmic cache 



  1. Almuallim H, Dietterich TG (1991) Learning with many irrelevant features. In: Proceedings of the ninth national conference on artificial intelligence. AAAI Press, pp 547–552Google Scholar
  2. Almuallim H, Dietterich TG (1994) Learning boolean concepts in the presence of many irrelevant features. Artif Intell 69(1–2):279–305MathSciNetCrossRefzbMATHGoogle Scholar
  3. Arauzo-Azofra A, Beníez JM, Castro JL (2008) Consistency measures for feature selection. J Intell Inf Syst 30(3):273–292. CrossRefGoogle Scholar
  4. Arauzo-Azofra A, Aznarte JL, Benítez JM (2011) Empirical study of feature selection methods based on individual feature evaluation for classification problems. Expert Syst Appl 38(7):8170–8177. CrossRefGoogle Scholar
  5. Atallah MJ, Fox S (eds) (1998) Algorithms and theory of computation handbook, 1st edn. CRC Press Inc, Boca RatonzbMATHGoogle Scholar
  6. Auger N, Nicaud C, Pivoteau C (2015) Merge Strategies: from Merge Sort to TimSort, working paper or preprint. Accessed 06 Mar 2019.
  7. Bharti KK, Singh PK (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst Appl 42(6):3105–3114CrossRefGoogle Scholar
  8. Chen X, Fang T, Huo H, Li D (2011) Graph-based feature selection for object-oriented classification in vhr airborne imagery. IEEE Trans Geosci Remote Sens 49(1):353–365CrossRefGoogle Scholar
  9. Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New YorkCrossRefzbMATHGoogle Scholar
  10. Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151(1–2):155–176. MathSciNetCrossRefzbMATHGoogle Scholar
  11. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetzbMATHGoogle Scholar
  12. Demšar J, Curk T, Erjavec A, Gorup Črt, Hočevar T, Milutinovič M, Možina M, Polajnar M, Toplak M, Starič A, Štajdohar M, Umek L, Žagar L, Žbontar J, Žitnik M, Zupan B (2013) Orange: data mining toolbox in python. J Mach Learn Res 14:2349–2353zbMATHGoogle Scholar
  13. Fortin FA, De Rainville FM, Gardner MA, Parizeau M, Gagné C (2012) DEAP: evolutionary algorithms made easy. J Mach Learn Res 13:2171–2175MathSciNetzbMATHGoogle Scholar
  14. Frigo M, Leiserson CE, Prokop H, Ramachandran S (2012) Cache-oblivious algorithms. ACM Trans Algorithms 8(1):4:1–4:22. MathSciNetCrossRefzbMATHGoogle Scholar
  15. García S, Luengo J, Herrera F (2016) Data preprocessing in data mining. Springer, BerlinGoogle Scholar
  16. Geng X, Liu TY, Qin T, Li H (2007) Feature selection for ranking. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 407–414Google Scholar
  17. Gui J, Sun Z, Ji S, Tao D, Tan T (2017) Feature selection based on structured sparsity: a comprehensive study. IEEE Trans Neural Netw Learn Syst 28(7):1490–1507MathSciNetCrossRefGoogle Scholar
  18. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer, New York Inc, New YorkCrossRefzbMATHGoogle Scholar
  19. Kern R (2016) rkern/line\_profiler.
  20. Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. SCIENCE 220(4598):671–680MathSciNetCrossRefzbMATHGoogle Scholar
  21. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324CrossRefzbMATHGoogle Scholar
  22. Koprinska I, Rana M, Agelidis VG (2015) Correlation and instance based feature selection for electricity load forecasting. Knowl Based Syst 82:29–40CrossRefGoogle Scholar
  23. Kowarschik M, Weiß C (2003) An overview of cache optimization techniques and cache-aware numerical algorithms. In: Algorithms for memory hierarchies, pp 213–232Google Scholar
  24. Lanaro G (2013) Python high performance programming. Packt Publishing, BirminghamGoogle Scholar
  25. Liu M, Zhang D (2016) Pairwise constraint-guided sparse learning for feature selection. IEEE Trans Cybern 46(1):298–310MathSciNetCrossRefGoogle Scholar
  26. Marill T, Green D (1963) On the effectiveness of receptors in recognition systems. IEEE Trans Inf Theory 9(1):11–17CrossRefGoogle Scholar
  27. Molina L, Belanche L, Nebot A (2002) Feature selection algorithms: a survey and experimental evaluation. In: Proceedings 2002 IEEE international conference on data mining, 2002. ICDM 2002, pp 306–313.
  28. Newman CBD, Merz C (1998) UCI repository of machine learning databases. Accessed 25 Nov 2017
  29. Onan A (2015) A fuzzy-rough nearest neighbor classifier combined with consistency-based subset evaluation and instance selection for automated diagnosis of breast cancer. Expert Syst Appl 42(20):6844–6852CrossRefGoogle Scholar
  30. Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11:341–356CrossRefzbMATHGoogle Scholar
  31. Qian W, Shu W (2015) Mutual information criterion for feature selection from incomplete data. Neurocomputing 168:210–220. CrossRefGoogle Scholar
  32. Shin K, Miyazaki S (2016) A fast and accurate feature selection algorithm based on binary consistency measure. Comput Intell 32(4):646–667. MathSciNetCrossRefGoogle Scholar
  33. Shin K, Fernandes D, Miyazaki S (2011) Consistency measures for feature selection: A formal definition, relative sensitivity comparison, and a fast algorithm. In: Walsh T (ed) IJCAI, IJCAI/AAAI, pp 1491–1497.
  34. Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25(1):1–14CrossRefGoogle Scholar
  35. Whitney AW (1971) A direct method of nonparametric measurement selection. IEEE Trans Comput 20(9):1100–1103. CrossRefzbMATHGoogle Scholar
  36. Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th international conference on machine learning. ACM, pp 1151–1157Google Scholar
  37. Zheng K, Wang X (2018) Feature selection method with joint maximal information entropy between features and class. Pattern Recognit 77:20–29. CrossRefGoogle Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Engineering SciencesUniversidad de CordobaCórdobaSpain

Personalised recommendations