Identification of topology-preserving, class-relevant feature subsets using multiobjective optimization

Saha, Sriparna; Kaur, Mandeep

doi:10.1007/s00500-018-3122-0

Identification of topology-preserving, class-relevant feature subsets using multiobjective optimization

Methodologies and Application
Published: 07 April 2018

Volume 23, pages 4717–4733, (2019)
Cite this article

Soft Computing Aims and scope Submit manuscript

Sriparna Saha¹ &
Mandeep Kaur¹

154 Accesses
1 Citation
Explore all metrics

Abstract

In the current work, a multiobjective-based feature selection technique is proposed which utilizes different quality measures to evaluate the goodness of reduced feature set. Two different perspectives are incorporated in the feature selection process: (1) selected subset of features should not destroy the geometric distribution of the sample space, i.e., the neighborhood topology should be preserved in the reduced feature space; (2) selected feature subset should have minimal redundancy and high correlation with the classes. In order to capture the second goal, several information theory-based quality measures like normalized mutual information, correlation with the class attribute, information gain and entropy are utilized. In order to capture the first aspect, concepts of shared nearest-neighbor distance are utilized. Multiobjective framework is employed to optimize all these measures, individually and in different combinations to reduce the feature set. The approach is evaluated on six publicly available data sets with respect to different classifiers, and results conclusively demonstrate the potency of utilizing both types of objectives functions in reducing the feature set. Several performance metrics like accuracy, redundancy and Jaccard score are used for measuring the quality of the selected feature subset in comparison with several state-of-the-art techniques. Experimental results on several data sets illustrate that there is no universal model (optimization of a set of objective functions) which can perform well over all the data sets with respect to different quality measures. But in general optimization of all objective functions (PMCI model) consistently performs well for all the data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

Article 19 January 2024

Benyamin Abdollahzadeh, Nima Khodadadi, … Seyedali Mirjalili

A review of unsupervised feature selection methods

Article 29 January 2019

Saúl Solorio-Fernández, J. Ariel Carrasco-Ochoa & José Fco. Martínez-Trinidad

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Dipti Theng & Kishor K. Bhoyar

Notes

References

Bhadra T, Bandyopadhyay S (2015) Unsupervised feature selection using an improved version of differential evolution. Expert Syst Appl 42(8):4042–4053
Article Google Scholar
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156. http://dl.acm.org/citation.cfm?id=2639279.2639281
Deb K, Kalyanmoy D (2001) Multi-objective optimization using evolutionary algorithms. Wiley, New York
MATH Google Scholar
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6(2):182–197
Article Google Scholar
Garg RP, Sharapov I (2002) Techniques for optimizing applications: high performance computing. Prentice Hall Professional Technical Reference, Upper Saddle River, NJ
Google Scholar
Houle ME, Kriegel H, Kröger P, Schubert E, Zimek A (2010) Can shared-neighbor distances defeat the curse of dimensionality? In: 22nd international conference scientific and statistical database management, SSDBM 2010, Heidelberg, Germany, 30 June–2 July 2010. Proceedings, pp 482–500. https://doi.org/10.1007/978-3-642-13818-8_34
Kira K, Rendell LA (1992) A practical approach to feature selection. In: Proceedings of the 9th international workshop on machine learning, ML92, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 249–256. http://dl.acm.org/citation.cfm?id=141975.142034
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324. https://doi.org/10.1016/S0004-3702(97)00043-X
Article MATH Google Scholar
Kundu PP, Mitra S (2015) Multi-objective optimization of shared nearest neighbor similarity for feature selection. Appl Soft Comput 37:751–762
Article Google Scholar
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502
Article Google Scholar
Molina LC, Belanche L, Nebot A (2002) Feature selection algorithms: a survey and experimental evaluation, In: Proceedings of the 2002 IEEE international conference on data mining, ICDM ’02, IEEE Computer Society, Washington, DC, USA, p 306. http://dl.acm.org/citation.cfm?id=844380.844722
Sánchez-Maroño N, Alonso-Betanzos A, Tombilla-Sanromán M (2007) Filter methods for feature selection: a comparative study. In: Proceedings of the 8th international conference on intelligent data engineering and automated learning, IDEAL’07. Springer, Berlin, pp 178–187. http://dl.acm.org/citation.cfm?id=1777942.1777962
Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning, In: Proceedings of the 24th international conference on machine learning, ICML ’07, ACM, New York, NY, USA, pp 1151–1157. https://doi.org/10.1145/1273496.1273641
Zhao Z, Wang L, Liu H, Ye J (2013) On similarity preserving feature selection. IEEE Trans Knowl Data Eng 25(3):619–632. https://doi.org/10.1109/TKDE.2011.222
Article Google Scholar

Download references

Acknowledgements

No funding is involved in this work. Authors would like to acknowledge the help from Indian Institute of Technology Patna to conduct this research.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, India
Sriparna Saha & Mandeep Kaur

Authors

Sriparna Saha
View author publications
You can also search for this author in PubMed Google Scholar
Mandeep Kaur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sriparna Saha.

Ethics declarations

Conflict of interest

All the authors declare that they do not have any conflict of interest.

Human and animal rights

We have not performed any experiments which involve animals or humans.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saha, S., Kaur, M. Identification of topology-preserving, class-relevant feature subsets using multiobjective optimization. Soft Comput 23, 4717–4733 (2019). https://doi.org/10.1007/s00500-018-3122-0

Download citation

Published: 07 April 2018
Issue Date: 01 July 2019
DOI: https://doi.org/10.1007/s00500-018-3122-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identification of topology-preserving, class-relevant feature subsets using multiobjective optimization

Abstract

Access this article

Similar content being viewed by others

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

A review of unsupervised feature selection methods

Feature selection techniques for machine learning: a survey of more than two decades of research

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Identification of topology-preserving, class-relevant feature subsets using multiobjective optimization

Abstract

Access this article

Similar content being viewed by others

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

A review of unsupervised feature selection methods

Feature selection techniques for machine learning: a survey of more than two decades of research

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation