Algorithmic cache of sorted tables for feature selection
- 91 Downloads
Feature selection is a mechanism used in Machine Learning to reduce the complexity and improve the speed of the learning process by using a subset of features from the data set. There are several measures which are used to assign a score to a subset of features and, therefore, are able to compare them and decide which one is the best. The bottle neck of consistence measures is having the information of the different examples available to check their class by groups. To handle it, this paper proposes the concept of an algorithmic cache, which stores sorted tables to speed up the access to example information. The work carries out an empirical study using 34 real-world data sets and four representative search strategies combined with different table caching strategies and three sorting methods. The experiments calculate four different consistency and one information measures, showing that the proposed sorted tables cache reduces computation time and it is competitive with hash table structures.
KeywordsFeature selection Attribute selection Consistency measures Information theory Data reduction Algorithmic cache
- Almuallim H, Dietterich TG (1991) Learning with many irrelevant features. In: Proceedings of the ninth national conference on artificial intelligence. AAAI Press, pp 547–552Google Scholar
- Auger N, Nicaud C, Pivoteau C (2015) Merge Strategies: from Merge Sort to TimSort, working paper or preprint. https://hal-upec-upem.archives-ouvertes.fr/hal-01212839. Accessed 06 Mar 2019.
- García S, Luengo J, Herrera F (2016) Data preprocessing in data mining. Springer, BerlinGoogle Scholar
- Geng X, Liu TY, Qin T, Li H (2007) Feature selection for ranking. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 407–414Google Scholar
- Kern R (2016) rkern/line\_profiler. https://github.com/rkern/line_profiler
- Kowarschik M, Weiß C (2003) An overview of cache optimization techniques and cache-aware numerical algorithms. In: Algorithms for memory hierarchies, pp 213–232Google Scholar
- Lanaro G (2013) Python high performance programming. Packt Publishing, BirminghamGoogle Scholar
- Molina L, Belanche L, Nebot A (2002) Feature selection algorithms: a survey and experimental evaluation. In: Proceedings 2002 IEEE international conference on data mining, 2002. ICDM 2002, pp 306–313. https://doi.org/10.1109/ICDM.2002.1183917
- Newman CBD, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html. Accessed 25 Nov 2017
- Shin K, Fernandes D, Miyazaki S (2011) Consistency measures for feature selection: A formal definition, relative sensitivity comparison, and a fast algorithm. In: Walsh T (ed) IJCAI, IJCAI/AAAI, pp 1491–1497. http://dblp.uni-trier.de/db/conf/ijcai/ijcai2011.html
- Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th international conference on machine learning. ACM, pp 1151–1157Google Scholar