Advertisement

A Novel LtR and RtL Framework for Subset Feature Selection (Reduction) for Improving the Classification Accuracy

  • Sai Prasad PotharajuEmail author
  • M. Sreedevi
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 713)

Abstract

Preprocessing is one of the data mining steps after data collection. There are several issues need to be addressed in preprocessing stage of data mining. One among them is feature selection (FS) or feature reduction (FR). There are several approaches available for handling issues of FS and FR. Those methods are categorized as filter, wrapper, and embedded modes. In this research, we introduce a novel filter-based feature selection framework called LtR (left to right) and RtL (right to left) based on symmetrical uncertainty (SU). Our method generates K-subset of features such that each subset has the finite number of unique features in it. Each subset is analyzed using various classifiers (Jrip, OneR, Ridor, J48, SimpleCart, Naive Bayes, IBk) and compared with the existing filter-based FS methods: information gain (IG), ReliefF (Rel), chi-squared attribute evaluator (Chi), and gain ratio attribute evaluator (GR). Experimental analysis revealed that minimum one of the subsets performs better than some of the existing methods.

Keywords

Data mining Preprocessing Feature selection Filter Symmetrical uncertainty 

References

  1. 1.
    Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2016)Google Scholar
  2. 2.
    Goswami, S., Chakrabarti, A.: Feature selection: a practitioner view. Int. J. Inf. Technol. Comput. Sci. 6, 66–77 (2014).  https://doi.org/10.5815/ijitcs.2014.11.10CrossRefGoogle Scholar
  3. 3.
    Amarnath, B., Balamurugan, S., Alias, A.: Review on feature selection techniques and its impact for effective data classification using uci machine learning repository dataset. J. Eng. Sci. Technol. 11, 1639–1646 (2016)Google Scholar
  4. 4.
    Tang, J., Alelyani, S., Liu, H.: Feature selection for classification: a review. Data Classificat. Algor. Appli. 37 (2014)Google Scholar
  5. 5.
    Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40, 16–28 (2014).  https://doi.org/10.1016/j.compeleceng.2013.11.024CrossRefGoogle Scholar
  6. 6.
    Kumar, V.: Feature selection: a literature review. Smart Comput. Rev. 4.  https://doi.org/10.6029/smartcr.2014.03.007
  7. 7.
    Singh, B., Kushwaha, N., Vyas, O.P.: A feature subset selection technique for high dimensional data using symmetric uncertainty. J. Data Anal. Inf. Process. 02, 95–105 (2014).  https://doi.org/10.4236/jdaip.2014.24012CrossRefGoogle Scholar
  8. 8.
    Song, Qinbao, Ni, Jingjie, Wang, Guangtao: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25, 1–14 (2013).  https://doi.org/10.1109/TKDE.2011.181CrossRefGoogle Scholar
  9. 9.
    Jalilvand, A., Salim, N.: Feature unionization: a novel approach for dimension reduction. Appl. Soft Comput. 52, 1253–1261 (2017).  https://doi.org/10.1016/j.asoc.2016.08.031CrossRefGoogle Scholar
  10. 10.
    Cesur, R., Ceyhan, E.B., Kermen, A., Sağıroğlu, Ş.: Determination of potential criminals in social network. Gazi Univ. J. Sci. 30, 121–131 (2017)Google Scholar
  11. 11.
    Mangai, J.A., Santhosh Kumar, V., Appavu alias Balamurugan, S.: A novel feature selection framework for automatic web page classification. Int. J. Automat. Comput. 9, 442–448.  https://doi.org/10.1007/s11633-012-0665-x (2012)CrossRefGoogle Scholar
  12. 12.
    Liu, C., Wang, W., Zhao, Q., Shen, X., Konan, M.: A new feature selection method based on a validity index of feature subset. Pattern Recogn. Lett. 92, 1–8 (2017).  https://doi.org/10.1016/j.patrec.2017.03.018CrossRefGoogle Scholar
  13. 13.
    Osanaiye, O., Cai, H., Choo, K.-K.R., Dehghantanha, A., Xu, Z., Dlodlo, M.: Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing. EURASIP J. Wirel. Commun. Netw.  https://doi.org/10.1186/s13638-016-0623-3 (2016)
  14. 14.
    Silwattananusarn, T., Kanarkard, W., Tuamsuk, K.: Enhanced classification accuracy for cardiotocogram data with ensemble feature selection and classifier ensemble. J. Comput. Commun. 04, 20–35 (2016).  https://doi.org/10.4236/jcc.2016.44003CrossRefGoogle Scholar
  15. 15.
    Piao, Y., Piao, M., Park, K., Ryu, K.H.: An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data. Bioinformatics 28, 3306–3315 (2012).  https://doi.org/10.1093/bioinformatics/bts602CrossRefGoogle Scholar
  16. 16.
    Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A., Benítez, J.M., Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014).  https://doi.org/10.1016/j.ins.2014.05.042CrossRefGoogle Scholar
  17. 17.
    Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Patil, P., Attar, V.: Intelligent detection of major network attacks using feature selection methods. In: Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011) December 20–22, 2011. Springer, pp. 671–679 (2012)Google Scholar
  20. 20.
    Potharaju, S.P., Sreedevi, M.: Ensembled rule based classification algorithms for predicting imbalanced kidney disease data. J. Eng. Sci. Technol. Rev. 9(5), 201–207 (2016)CrossRefGoogle Scholar
  21. 21.

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Department of CSEK L UniversityGunturIndia

Personalised recommendations