Skip to main content

Unsupervised Feature Selection Using Information-Theoretic Graph-Based Approach

  • Chapter
  • First Online:
Algorithms in Machine Learning Paradigms

Part of the book series: Studies in Computational Intelligence ((SCI,volume 870))

  • 1580 Accesses

Abstract

Feature  selection is a critical part of any machine learning project involving data sets with high dimensionality. Selecting n optimal subset consisting of important features reduces the execution time and increases the predictive ability of the machine learning model. This paper presents a novel graph-based feature selection algorithm for unsupervised learning. Unlike many of the algorithms using correlation as a measure of dependency between features, the proposed algorithm derives feature dependency using information-theoretic approach. The proposed algorithm—Graph-based Information-Theoretic Approach for Unsupervised Feature Selection (GITAUFS) generates multiple minimal vertex covers (MVC) of the feature graph and evaluates them to find the most optimal one in context of the learning task. In our experimental setup comprising 13 benchmark data sets, GITAUFS has shown a 10% increase in the silhouette width value along with a significant feature reduction of 90.62% compared to the next best performing algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bache K, Lichman M (2013) Uci machine learning repository http://archive.ics.uci.edu/ml. irvine, ca: University of California. School Inf Comput Sci 28

  2. Bandyopadhyay S, Bhadra T, Mitra P, Maulik U (2014) Integration of dense subgraph finding with feature clustering for unsupervised feature selection. Pattern Recognit Lett 40:104–112

    Article  Google Scholar 

  3. Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550

    Article  Google Scholar 

  4. Bennasar M, Hicks Y, Setchi R (2015) Feature selection using joint mutual information maximisation. Expert Syst Appl 42(22):8520–8532

    Article  Google Scholar 

  5. Brown G, Pocock A, Zhao MJ, Luján M (2012) Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J Mach Learn Res 13:27–66

    Google Scholar 

  6. Das AK, Goswami S, Chakrabarti A, Chakraborty B (2017) A new hybrid feature selection approach using feature association map for supervised and unsupervised classification. Expert Syst Appl 88:81–94

    Article  Google Scholar 

  7. Das AK, Goswami S, Chakraborty B, Chakrabarti A (2017) A graph-theoretic approach for visualization of data set feature association. Adv Comput Syst Secur, 109–124. Springer

    Google Scholar 

  8. Dey Sarkar S, Goswami S, Agarwal A, Aktar J (2014) A novel feature selection technique for text classification using naive bayes. Int Sch Rres Notices

    Google Scholar 

  9. Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(02):185–205

    Article  Google Scholar 

  10. Feng S, Duarte MF (2018) Graph autoencoder-based unsupervised feature selection with broad and local data structure preservation. Neurocomputing 312:310–323

    Article  Google Scholar 

  11. Goswami S, Das AK, Guha P, Tarafdar A, Chakraborty S, Chakrabarti A, Chakraborty B (2017) An approach of feature selection using graph-theoretic heuristic and hill climbing. Pattern Anal Appl, 1–17 (2017)

    Google Scholar 

  12. Gu Q, Li Z, Han J (2012) Generalized fisher score for feature selection. arXiv preprint arXiv:1202.3725 (2012)

  13. Hall MA (1999) Correlation-based feature selection for machine learning

    Google Scholar 

  14. He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. In: Advances in neural information processing systems, 507–514

    Google Scholar 

  15. Hua J, Tembe WD, Dougherty ER (2009) Performance of feature-selection methods in the classification of high-dimension data. Pattern Recogn 42(3):409–424

    Article  Google Scholar 

  16. Lewis DD (1992) Feature selection and feature extraction for text categorization. In: Proceedings of the workshop on speech and natural language, pp 212–217. Association for Computational Linguistics (1992)

    Google Scholar 

  17. Lu Y, Cohen I, Zhou XS, Tian Q (2007) Feature selection using principal feature analysis. In: Proceedings of the 15th ACM international conference on Multimedia, pp 301–304. ACM

    Google Scholar 

  18. Meyer PE, Schretter C, Bontempi G (2008) Information-theoretic feature selection in microarray data using variable complementarity. IEEE J Sel Topics Signal Process 2(3):261–274

    Article  Google Scholar 

  19. Moghaddam B, Pentland A (1995) Probabilistic visual learning for object detection. In: Proceedings of IEEE international conference on computer vision, pp 786–793. IEEE (1995)

    Google Scholar 

  20. Moradi P, Rostami M (2015) A graph theoretic approach for unsupervised feature selection. Eng Appl Artif Intell 44:33–45

    Article  Google Scholar 

  21. Moradi P, Rostami M (2015) Integration of graph clustering with ant colony optimization for feature selection. Knowl Based Syst 84:144–161

    Article  Google Scholar 

  22. Murphy K, Torralba A, Eaton D, Freeman W (2006) Object detection and localization using local and global features. In: Toward category-level object recognition, pp 382–400. Springer

    Google Scholar 

  23. Ng K, Liu H (2000) Customer retention via data mining. Artif Intell Rev 14(6):569–590

    Article  Google Scholar 

  24. Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier

    Google Scholar 

  25. Xing EP, Jordan MI, Karp RM (2001) Feature selection for high-dimensional genomic microarray data. In: ICML, vol. 1, pp. 601–608. Citeseer (2001)

    Google Scholar 

  26. Yang HH, Moody J (2000) Data visualization and feature selection: New algorithms for nongaussian data. In: Advances in neural information processing systems, pp 687–693 (2000)

    Google Scholar 

  27. Zhang Z, Hancock ER (2011) A graph-based approach to feature selection. In: International workshop on graph-based representations in pattern recognition, pp 205–214. Springer (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sagarika Saroj Kundu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kundu, S.S., Das, A., Das, A.K. (2020). Unsupervised Feature Selection Using Information-Theoretic Graph-Based Approach. In: Mandal, J., Mukhopadhyay, S., Dutta, P., Dasgupta, K. (eds) Algorithms in Machine Learning Paradigms. Studies in Computational Intelligence, vol 870. Springer, Singapore. https://doi.org/10.1007/978-981-15-1041-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-1041-0_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-1040-3

  • Online ISBN: 978-981-15-1041-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics