Skip to main content
Log in

Identification of topology-preserving, class-relevant feature subsets using multiobjective optimization

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In the current work, a multiobjective-based feature selection technique is proposed which utilizes different quality measures to evaluate the goodness of reduced feature set. Two different perspectives are incorporated in the feature selection process: (1) selected subset of features should not destroy the geometric distribution of the sample space, i.e., the neighborhood topology should be preserved in the reduced feature space; (2) selected feature subset should have minimal redundancy and high correlation with the classes. In order to capture the second goal, several information theory-based quality measures like normalized mutual information, correlation with the class attribute, information gain and entropy are utilized. In order to capture the first aspect, concepts of shared nearest-neighbor distance are utilized. Multiobjective framework is employed to optimize all these measures, individually and in different combinations to reduce the feature set. The approach is evaluated on six publicly available data sets with respect to different classifiers, and results conclusively demonstrate the potency of utilizing both types of objectives functions in reducing the feature set. Several performance metrics like accuracy, redundancy and Jaccard score are used for measuring the quality of the selected feature subset in comparison with several state-of-the-art techniques. Experimental results on several data sets illustrate that there is no universal model (optimization of a set of objective functions) which can perform well over all the data sets with respect to different quality measures. But in general optimization of all objective functions (PMCI model) consistently performs well for all the data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://archive.ics.uci.edu/ml/datasets.html.

  2. https://archive.ics.uci.edu/ml/support/

    Optical+Recognition+of+Handwritten+Digits.

  3. http://www.face-rec.org/databases/.

  4. http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php.

  5. https://archive.ics.uci.edu/ml/datasets/ozone+level+detection.

  6. https://archive.ics.uci.edu/ml/datasets/ozone+level+detection.

  7. http://www.iitk.ac.in/kangal/codes.shtml.

  8. http://scikit-learn.org/stable/modules/generated/sklearn.metrics.

References

  • Bhadra T, Bandyopadhyay S (2015) Unsupervised feature selection using an improved version of differential evolution. Expert Syst Appl 42(8):4042–4053

    Article  Google Scholar 

  • Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156. http://dl.acm.org/citation.cfm?id=2639279.2639281

  • Deb K, Kalyanmoy D (2001) Multi-objective optimization using evolutionary algorithms. Wiley, New York

    MATH  Google Scholar 

  • Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6(2):182–197

    Article  Google Scholar 

  • Garg RP, Sharapov I (2002) Techniques for optimizing applications: high performance computing. Prentice Hall Professional Technical Reference, Upper Saddle River, NJ

    Google Scholar 

  • Houle ME, Kriegel H, Kröger P, Schubert E, Zimek A (2010) Can shared-neighbor distances defeat the curse of dimensionality? In: 22nd international conference scientific and statistical database management, SSDBM 2010, Heidelberg, Germany, 30 June–2 July 2010. Proceedings, pp 482–500. https://doi.org/10.1007/978-3-642-13818-8_34

  • Kira K, Rendell LA (1992) A practical approach to feature selection. In: Proceedings of the 9th international workshop on machine learning, ML92, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 249–256. http://dl.acm.org/citation.cfm?id=141975.142034

  • Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324. https://doi.org/10.1016/S0004-3702(97)00043-X

    Article  MATH  Google Scholar 

  • Kundu PP, Mitra S (2015) Multi-objective optimization of shared nearest neighbor similarity for feature selection. Appl Soft Comput 37:751–762

    Article  Google Scholar 

  • Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502

    Article  Google Scholar 

  • Molina LC, Belanche L, Nebot A (2002) Feature selection algorithms: a survey and experimental evaluation, In: Proceedings of the 2002 IEEE international conference on data mining, ICDM ’02, IEEE Computer Society, Washington, DC, USA, p 306. http://dl.acm.org/citation.cfm?id=844380.844722

  • Sánchez-Maroño N, Alonso-Betanzos A, Tombilla-Sanromán M (2007) Filter methods for feature selection: a comparative study. In: Proceedings of the 8th international conference on intelligent data engineering and automated learning, IDEAL’07. Springer, Berlin, pp 178–187. http://dl.acm.org/citation.cfm?id=1777942.1777962

  • Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning, In: Proceedings of the 24th international conference on machine learning, ICML ’07, ACM, New York, NY, USA, pp 1151–1157. https://doi.org/10.1145/1273496.1273641

  • Zhao Z, Wang L, Liu H, Ye J (2013) On similarity preserving feature selection. IEEE Trans Knowl Data Eng 25(3):619–632. https://doi.org/10.1109/TKDE.2011.222

    Article  Google Scholar 

Download references

Acknowledgements

No funding is involved in this work. Authors would like to acknowledge the help from Indian Institute of Technology Patna to conduct this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sriparna Saha.

Ethics declarations

Conflict of interest

All the authors declare that they do not have any conflict of interest.

Human and animal rights

We have not performed any experiments which involve animals or humans.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saha, S., Kaur, M. Identification of topology-preserving, class-relevant feature subsets using multiobjective optimization. Soft Comput 23, 4717–4733 (2019). https://doi.org/10.1007/s00500-018-3122-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-018-3122-0

Keywords

Navigation