Unsupervised Feature Selection Using Information-Theoretic Graph-Based Approach

Kundu, Sagarika Saroj; Das, Abhirup; Das, Amit Kumar

doi:10.1007/978-981-15-1041-0_2

Sagarika Saroj Kundu⁶,
Abhirup Das⁶ &
Amit Kumar Das⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 870))

1580 Accesses

Abstract

Feature selection is a critical part of any machine learning project involving data sets with high dimensionality. Selecting n optimal subset consisting of important features reduces the execution time and increases the predictive ability of the machine learning model. This paper presents a novel graph-based feature selection algorithm for unsupervised learning. Unlike many of the algorithms using correlation as a measure of dependency between features, the proposed algorithm derives feature dependency using information-theoretic approach. The proposed algorithm—Graph-based Information-Theoretic Approach for Unsupervised Feature Selection (GITAUFS) generates multiple minimal vertex covers (MVC) of the feature graph and evaluates them to find the most optimal one in context of the learning task. In our experimental setup comprising 13 benchmark data sets, GITAUFS has shown a 10% increase in the silhouette width value along with a significant feature reduction of 90.62% compared to the next best performing algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bache K, Lichman M (2013) Uci machine learning repository http://archive.ics.uci.edu/ml. irvine, ca: University of California. School Inf Comput Sci 28
Bandyopadhyay S, Bhadra T, Mitra P, Maulik U (2014) Integration of dense subgraph finding with feature clustering for unsupervised feature selection. Pattern Recognit Lett 40:104–112
Article Google Scholar
Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550
Article Google Scholar
Bennasar M, Hicks Y, Setchi R (2015) Feature selection using joint mutual information maximisation. Expert Syst Appl 42(22):8520–8532
Article Google Scholar
Brown G, Pocock A, Zhao MJ, Luján M (2012) Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J Mach Learn Res 13:27–66
Google Scholar
Das AK, Goswami S, Chakrabarti A, Chakraborty B (2017) A new hybrid feature selection approach using feature association map for supervised and unsupervised classification. Expert Syst Appl 88:81–94
Article Google Scholar
Das AK, Goswami S, Chakraborty B, Chakrabarti A (2017) A graph-theoretic approach for visualization of data set feature association. Adv Comput Syst Secur, 109–124. Springer
Google Scholar
Dey Sarkar S, Goswami S, Agarwal A, Aktar J (2014) A novel feature selection technique for text classification using naive bayes. Int Sch Rres Notices
Google Scholar
Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(02):185–205
Article Google Scholar
Feng S, Duarte MF (2018) Graph autoencoder-based unsupervised feature selection with broad and local data structure preservation. Neurocomputing 312:310–323
Article Google Scholar
Goswami S, Das AK, Guha P, Tarafdar A, Chakraborty S, Chakrabarti A, Chakraborty B (2017) An approach of feature selection using graph-theoretic heuristic and hill climbing. Pattern Anal Appl, 1–17 (2017)
Google Scholar
Gu Q, Li Z, Han J (2012) Generalized fisher score for feature selection. arXiv preprint arXiv:1202.3725 (2012)
Hall MA (1999) Correlation-based feature selection for machine learning
Google Scholar
He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. In: Advances in neural information processing systems, 507–514
Google Scholar
Hua J, Tembe WD, Dougherty ER (2009) Performance of feature-selection methods in the classification of high-dimension data. Pattern Recogn 42(3):409–424
Article Google Scholar
Lewis DD (1992) Feature selection and feature extraction for text categorization. In: Proceedings of the workshop on speech and natural language, pp 212–217. Association for Computational Linguistics (1992)
Google Scholar
Lu Y, Cohen I, Zhou XS, Tian Q (2007) Feature selection using principal feature analysis. In: Proceedings of the 15th ACM international conference on Multimedia, pp 301–304. ACM
Google Scholar
Meyer PE, Schretter C, Bontempi G (2008) Information-theoretic feature selection in microarray data using variable complementarity. IEEE J Sel Topics Signal Process 2(3):261–274
Article Google Scholar
Moghaddam B, Pentland A (1995) Probabilistic visual learning for object detection. In: Proceedings of IEEE international conference on computer vision, pp 786–793. IEEE (1995)
Google Scholar
Moradi P, Rostami M (2015) A graph theoretic approach for unsupervised feature selection. Eng Appl Artif Intell 44:33–45
Article Google Scholar
Moradi P, Rostami M (2015) Integration of graph clustering with ant colony optimization for feature selection. Knowl Based Syst 84:144–161
Article Google Scholar
Murphy K, Torralba A, Eaton D, Freeman W (2006) Object detection and localization using local and global features. In: Toward category-level object recognition, pp 382–400. Springer
Google Scholar
Ng K, Liu H (2000) Customer retention via data mining. Artif Intell Rev 14(6):569–590
Article Google Scholar
Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier
Google Scholar
Xing EP, Jordan MI, Karp RM (2001) Feature selection for high-dimensional genomic microarray data. In: ICML, vol. 1, pp. 601–608. Citeseer (2001)
Google Scholar
Yang HH, Moody J (2000) Data visualization and feature selection: New algorithms for nongaussian data. In: Advances in neural information processing systems, pp 687–693 (2000)
Google Scholar
Zhang Z, Hancock ER (2011) A graph-based approach to feature selection. In: International workshop on graph-based representations in pattern recognition, pp 205–214. Springer (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Engineering and Management, Kolkata, India
Sagarika Saroj Kundu, Abhirup Das & Amit Kumar Das

Authors

Sagarika Saroj Kundu
View author publications
You can also search for this author in PubMed Google Scholar
Abhirup Das
View author publications
You can also search for this author in PubMed Google Scholar
Amit Kumar Das
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sagarika Saroj Kundu .

Editor information

Editors and Affiliations

Kalyani University, Kalyani, India
Jyotsna Kumar Mandal
Department of Computer Science and Engineering, Triguna Sen School of Technology, Assam University, Silchar, Assam, India
Somnath Mukhopadhyay
Department of Computer and System Sciences, Visva-Bharati University, Bolpur, West Bengal, India
Paramartha Dutta
Department of Computer Science and Engineering, Kalyani Government Engineering College, Kalyani, West Bengal, India
Kousik Dasgupta

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kundu, S.S., Das, A., Das, A.K. (2020). Unsupervised Feature Selection Using Information-Theoretic Graph-Based Approach. In: Mandal, J., Mukhopadhyay, S., Dutta, P., Dasgupta, K. (eds) Algorithms in Machine Learning Paradigms. Studies in Computational Intelligence, vol 870. Springer, Singapore. https://doi.org/10.1007/978-981-15-1041-0_2

Download citation

DOI: https://doi.org/10.1007/978-981-15-1041-0_2
Published: 03 January 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1040-3
Online ISBN: 978-981-15-1041-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics