Abstract
Preprocessing is one of the data mining steps after data collection. There are several issues need to be addressed in preprocessing stage of data mining. One among them is feature selection (FS) or feature reduction (FR). There are several approaches available for handling issues of FS and FR. Those methods are categorized as filter, wrapper, and embedded modes. In this research, we introduce a novel filter-based feature selection framework called LtR (left to right) and RtL (right to left) based on symmetrical uncertainty (SU). Our method generates K-subset of features such that each subset has the finite number of unique features in it. Each subset is analyzed using various classifiers (Jrip, OneR, Ridor, J48, SimpleCart, Naive Bayes, IBk) and compared with the existing filter-based FS methods: information gain (IG), ReliefF (Rel), chi-squared attribute evaluator (Chi), and gain ratio attribute evaluator (GR). Experimental analysis revealed that minimum one of the subsets performs better than some of the existing methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2016)
Goswami, S., Chakrabarti, A.: Feature selection: a practitioner view. Int. J. Inf. Technol. Comput. Sci. 6, 66–77 (2014). https://doi.org/10.5815/ijitcs.2014.11.10
Amarnath, B., Balamurugan, S., Alias, A.: Review on feature selection techniques and its impact for effective data classification using uci machine learning repository dataset. J. Eng. Sci. Technol. 11, 1639–1646 (2016)
Tang, J., Alelyani, S., Liu, H.: Feature selection for classification: a review. Data Classificat. Algor. Appli. 37 (2014)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40, 16–28 (2014). https://doi.org/10.1016/j.compeleceng.2013.11.024
Kumar, V.: Feature selection: a literature review. Smart Comput. Rev. 4. https://doi.org/10.6029/smartcr.2014.03.007
Singh, B., Kushwaha, N., Vyas, O.P.: A feature subset selection technique for high dimensional data using symmetric uncertainty. J. Data Anal. Inf. Process. 02, 95–105 (2014). https://doi.org/10.4236/jdaip.2014.24012
Song, Qinbao, Ni, Jingjie, Wang, Guangtao: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25, 1–14 (2013). https://doi.org/10.1109/TKDE.2011.181
Jalilvand, A., Salim, N.: Feature unionization: a novel approach for dimension reduction. Appl. Soft Comput. 52, 1253–1261 (2017). https://doi.org/10.1016/j.asoc.2016.08.031
Cesur, R., Ceyhan, E.B., Kermen, A., Sağıroğlu, Ş.: Determination of potential criminals in social network. Gazi Univ. J. Sci. 30, 121–131 (2017)
Mangai, J.A., Santhosh Kumar, V., Appavu alias Balamurugan, S.: A novel feature selection framework for automatic web page classification. Int. J. Automat. Comput. 9, 442–448. https://doi.org/10.1007/s11633-012-0665-x (2012)
Liu, C., Wang, W., Zhao, Q., Shen, X., Konan, M.: A new feature selection method based on a validity index of feature subset. Pattern Recogn. Lett. 92, 1–8 (2017). https://doi.org/10.1016/j.patrec.2017.03.018
Osanaiye, O., Cai, H., Choo, K.-K.R., Dehghantanha, A., Xu, Z., Dlodlo, M.: Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing. EURASIP J. Wirel. Commun. Netw. https://doi.org/10.1186/s13638-016-0623-3 (2016)
Silwattananusarn, T., Kanarkard, W., Tuamsuk, K.: Enhanced classification accuracy for cardiotocogram data with ensemble feature selection and classifier ensemble. J. Comput. Commun. 04, 20–35 (2016). https://doi.org/10.4236/jcc.2016.44003
Piao, Y., Piao, M., Park, K., Ryu, K.H.: An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data. Bioinformatics 28, 3306–3315 (2012). https://doi.org/10.1093/bioinformatics/bts602
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A., BenÃtez, J.M., Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014). https://doi.org/10.1016/j.ins.2014.05.042
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)
Patil, P., Attar, V.: Intelligent detection of major network attacks using feature selection methods. In: Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011) December 20–22, 2011. Springer, pp. 671–679 (2012)
Potharaju, S.P., Sreedevi, M.: Ensembled rule based classification algorithms for predicting imbalanced kidney disease data. J. Eng. Sci. Technol. Rev. 9(5), 201–207 (2016)
https://archive.ics.uci.edu/ml/machine-learning-databases/dermatology/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Potharaju, S.P., Sreedevi, M. (2019). A Novel LtR and RtL Framework for Subset Feature Selection (Reduction) for Improving the Classification Accuracy. In: Pati, B., Panigrahi, C., Misra, S., Pujari, A., Bakshi, S. (eds) Progress in Advanced Computing and Intelligent Engineering. Advances in Intelligent Systems and Computing, vol 713. Springer, Singapore. https://doi.org/10.1007/978-981-13-1708-8_20
Download citation
DOI: https://doi.org/10.1007/978-981-13-1708-8_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1707-1
Online ISBN: 978-981-13-1708-8
eBook Packages: EngineeringEngineering (R0)