Abstract
Data of huge dimensionality is generated because of wide application of technologies. Using this data for the very purpose of decision-making is greatly affected because of the curse of dimensionality as selection of all features will lead to overfitting and ignoring the relevant ones can lead to information loss. Feature selection algorithms help to overcome this problem by identifying the subset of original features by retaining relevant features and by removing the redundant ones. This paper aims to evaluate and analyze some of the most popular feature selection algorithms using different benchmarked datasets. Relief, ReliefF, and Random Forest algorithms are evaluated and analyzed in the form of combinations of different rankers and classifiers. It is observed empirically that the accuracy of the ranker and classifier varies from dataset to dataset. This paper introduces the concept of applying multivariate correlation analysis (MCA) for feature selection. From results, it can be inferred that MCA exhibits better performance over the legacy-based feature selection algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Pattanshetti, T., Attar, V.: Survey of performance modeling of big data applications. In: 7th IEEE Conference on Cloud Computing, Data Science and Engineering, Confluence (2017)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. 1157–82 (2003)
Chandrashekar, G., Sahin, F.: A Survey on Feature Selection Methods, vol. 40, pp. 16–28. Elsevier (2013)
Genuer, R., Poggi, J.-M., Tuleau-Malot, C.: Variable Selection using Random Forest. 31, 2225–223, (2010)
Mitra, P., Murthy, C., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24, 301–312 (2002)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: 9th International Conference on Machine Learning, pp. 249–256 (1999)
Gilad-Bachrach, R., Navot, A., Tishby, N.: Margin based feature selection—theory and algorithms. In: 21st International Conference on Machine Learning (2004)
Sun, Yijun: Iterative RELIEF for feature weighting: algorithms, theories, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 29, 6 (2007)
Kononenko, I.: Estimating Attributes: Analysis and Extensions of RELIEF European Conference on Machine Learning, vol. 784, pp. 171–182(1994)
Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast co-relation-based filter solution. In: Proceedings of the Twentieth International Conference on Machine Learning (2003)
Duch, W., Biesiada, J.: Feature selection for high-dimensional data: a kolmogorov-smirnov co-relation-based filter solution. Advances in Soft Computing, pp. 95–104. Springer (2005)
Refaeilzadeh, P., Tang, L., Liu, H.: On Comparison of Feature Selection Algorithms WS-07-05, 34-39 (2003)
Chi, J.: Entropy based feature evaluation and selection technique. In: Proceedings of 4th Australian Conference on Neural Networks. ACNN (1993)
Statnikov, A., Aliferis, C., Tsamardinos, I., Hardin, D., Levy, S.: A comprehensive evaluation of multi-category classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21, 631–643 (2005)
Wang, S., Tang, J., Liu, H.: Embedded Unsupervised Feature Selection, Association for the Advancement of Artificial Intelligence (2015)
Li, J., Hu, X., Tang, J., Liu, H.: Unsupervised Streaming Feature Selection in Social Media, CIKM’15. ACM, Melbourne, Australia (2015)
Weather forecast dataset link. https://nomads.ncdc.noaa.gov/data/gfsanl/
Breast cancer datasetlink. http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Pattanshetti, T., Attar, V. (2019). Unsupervised Feature Selection Using Correlation Score. In: Iyer, B., Nalbalwar, S., Pathak, N. (eds) Computing, Communication and Signal Processing . Advances in Intelligent Systems and Computing, vol 810. Springer, Singapore. https://doi.org/10.1007/978-981-13-1513-8_37
Download citation
DOI: https://doi.org/10.1007/978-981-13-1513-8_37
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1512-1
Online ISBN: 978-981-13-1513-8
eBook Packages: EngineeringEngineering (R0)