A hybrid approach using rough set theory and hypergraph for feature selection on high-dimensional medical datasets
- 79 Downloads
‘Curse of Dimensionality’—massive generation of high-dimensional medical datasets from various biomedical applications hardens the data analytic process for precise medical diagnosis. The design of an efficient feature selection technique for finding the optimal feature subset can be devised as a prominent solution to the above-said challenge. Further, it also improves the accuracy and minimizes the computational complexity of the learning model. The state-of-the-art feature selection techniques based on heuristic and statistical functions suffer from significant challenges in terms of classification accuracy, time complexity, etc. Hence, this paper presents Rough Set Theory and Hypergraph (RSHGT)-based feature selection technique to identify the optimal feature subset for accurate medical diagnosis. Experimental validations using six medical datasets from the Kent Ridge Biomedical dataset repository prove the efficiency of RSHGT in terms of reduct size, accuracy, precision, recall, and time complexity.
KeywordsHypergraph Rough set theory (RST) Vertex linearity Minimal transversal Medical diagnosis
This work was supported by The Department of Science and Technology – India, and TATA Realty – SASTRA Srinivasa Ramanujan Research Cell (Grant No: SR/FST/MSI-107/2015, MRT/2017/000155, and SR/FST/ETI-349/2013).
Compliance with ethical standards
Conflict of interest
All the authors declare that they do not have any conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
- Alba E, Garcia-Nieto J, Jourdan L, Talbi E-G (2007) Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms. In: IEEE congress on evolutionary computation. IEEE, pp 284–290Google Scholar
- Dharmarajan R, Kannan K (2013) On minimal transversals in simple hypergraphs. Int J Comput Appl Math 7:117–123Google Scholar
- Hu K, Diao L, Lu Y, Shi C (2000) A heuristic optimal reduct algorithm. In: International conference on intelligent data engineering and automated learning: data mining, financial engineering, and intelligent agents, pp 89–99Google Scholar
- Huerta E, Duval B, Hao J (2008) Gene selection for microarray data by a LDA-based genetic algorithm. In: IAPR international conference on pattern recognition in bioinformatics. Springer, Berlin, Heidelberg, pp 250–261Google Scholar
- Øhrn A, Komorowski J (1997) Rosetta–a rough set toolkit for analysis of data. In: Third international joint conference on information sciences, pp 403–407Google Scholar
- Sánchez-Maroño N, Alonso-Betanzos A (2007) Filter methods for feature selection–a comparative study. In: International conference on intelligent data engineering and automated learning. Springer, Berlin, Heidelberg, pp 178–187Google Scholar
- Sohrabi MK, Tajik A (2017) Multi-objective feature selection for warfarin dose prediction. Comput Biol Chem 69:126–133. https://doi.org/10.1016/j.compbiolchem.2017.06.002 CrossRefGoogle Scholar
- Wang X, Gotoh O (2009) Microarray-based cancer prediction using soft computing approach. 7:123–139Google Scholar
- Witten I, Frank E, Hall M, Pal C (2016) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan KaufmannGoogle Scholar
- Wroblewski J (1995) Finding minimal reducts using genetic algorithms. In: Proccedings of the second annual join conference on infromation science, pp 186–189Google Scholar