Advertisement

Soft Computing

, Volume 23, Issue 23, pp 12655–12672 | Cite as

A hybrid approach using rough set theory and hypergraph for feature selection on high-dimensional medical datasets

  • M. R. Gauthama Raman
  • Somu Nivethitha
  • Krithivasan Kannan
  • V. S. Shankar SriramEmail author
Methodologies and Application

Abstract

‘Curse of Dimensionality’—massive generation of high-dimensional medical datasets from various biomedical applications hardens the data analytic process for precise medical diagnosis. The design of an efficient feature selection technique for finding the optimal feature subset can be devised as a prominent solution to the above-said challenge. Further, it also improves the accuracy and minimizes the computational complexity of the learning model. The state-of-the-art feature selection techniques based on heuristic and statistical functions suffer from significant challenges in terms of classification accuracy, time complexity, etc. Hence, this paper presents Rough Set Theory and Hypergraph (RSHGT)-based feature selection technique to identify the optimal feature subset for accurate medical diagnosis. Experimental validations using six medical datasets from the Kent Ridge Biomedical dataset repository prove the efficiency of RSHGT in terms of reduct size, accuracy, precision, recall, and time complexity.

Keywords

Hypergraph Rough set theory (RST) Vertex linearity Minimal transversal Medical diagnosis 

Notes

Funding

This work was supported by The Department of Science and Technology – India, and TATA Realty – SASTRA Srinivasa Ramanujan Research Cell (Grant No: SR/FST/MSI-107/2015, MRT/2017/000155, and SR/FST/ETI-349/2013).

Compliance with ethical standards

Conflict of interest

All the authors declare that they do not have any conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

References

  1. Abdel-Zaher AM, Eldeib AM (2016) Breast cancer classification using deep belief networks. Expert Syst Appl 46:139–144.  https://doi.org/10.1016/j.eswa.2015.10.015 CrossRefGoogle Scholar
  2. Abdi MJ, Hosseini SM, Rezghi M (2012) A novel weighted support vector machine based on particle swarm optimization for gene selection and tumor classification. Comput Math Methods Med 2012:1–7.  https://doi.org/10.1155/2012/320698 MathSciNetCrossRefzbMATHGoogle Scholar
  3. Abraham A, Falc R, Bello R (2009) Rough set theory: a true landmark in data analysis. Springer, BerlinCrossRefGoogle Scholar
  4. Alba E, Garcia-Nieto J, Jourdan L, Talbi E-G (2007) Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms. In: IEEE congress on evolutionary computation. IEEE, pp 284–290Google Scholar
  5. Berge C (1973) Graphs and hypergraphs. North-Holland Publishing Co., AmsterdamzbMATHGoogle Scholar
  6. Bonilla Huerta E, Duval B, Hao J-K (2010) A hybrid LDA and genetic algorithm for gene selection and classification of microarray data. Neurocomputing 73:2375–2383.  https://doi.org/10.1016/j.neucom.2010.03.024 CrossRefGoogle Scholar
  7. Bostani H, Sheikhan M (2017) Hybrid of binary gravitational search algorithm and mutual information for feature selection in intrusion detection systems. Soft Comput 21:2307–2324.  https://doi.org/10.1007/s00500-015-1942-8 CrossRefGoogle Scholar
  8. Chen Y, Zhu Q, Xu H (2015) Finding rough set reducts with fish swarm algorithm. Knowl Based Syst 81:22–29.  https://doi.org/10.1016/j.knosys.2015.02.002 CrossRefGoogle Scholar
  9. Cheruku R, Edla DR, Kuppili V, Dharavath R (2017) RST-BatMiner: a fuzzy rule miner integrating rough set feature selection and Bat optimization for detection of diabetes disease. Appl Soft Comput 67:764.  https://doi.org/10.1016/j.asoc.2017.06.032 CrossRefGoogle Scholar
  10. Cong Y, Wang S, Fan B et al (2016) UDSFS: unsupervised deep sparse feature selection. Neurocomputing 196:150–158.  https://doi.org/10.1016/j.neucom.2015.10.130 CrossRefGoogle Scholar
  11. Dharmarajan R, Kannan K (2013) On minimal transversals in simple hypergraphs. Int J Comput Appl Math 7:117–123Google Scholar
  12. Eiter T, Gottlob G (1995) Identifying the minimal transversals of a hypergraph and related problems. SIAM J Comput 24:1278–1304MathSciNetCrossRefGoogle Scholar
  13. El Akadi A, Amine A, El Ouardighi A, Aboutajdine D (2011) A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowl Inf Syst 26:487–500.  https://doi.org/10.1007/s10115-010-0288-x CrossRefGoogle Scholar
  14. Gauthama Raman MR, Kirthivasan K, Shankar Sriram VS (2017a) Development of rough set–hypergraph technique for key feature identification in intrusion detection systems. Comput Electr Eng 59:189–200.  https://doi.org/10.1016/j.compeleceng.2017.01.006 CrossRefGoogle Scholar
  15. Gauthama Raman MR, Somu N, Kirthivasan K et al (2017b) An efficient intrusion detection system based on hypergraph-genetic algorithm for parameter optimization and feature selection in support vector machine. Knowl Based Syst.  https://doi.org/10.1016/j.knosys.2017.07.005 CrossRefGoogle Scholar
  16. Hu, Xiaohua, Nick Cercone JH, Hu X, Cercone N, Han J (1994) An attribute-oriented rough set approach for knowledge discovery in databases. In: Ziarko WP (ed) Rough sets, fuzzy sets and knowledge discovery. Springer, London, pp 90–99CrossRefGoogle Scholar
  17. Hu K, Diao L, Lu Y, Shi C (2000) A heuristic optimal reduct algorithm. In: International conference on intelligent data engineering and automated learning: data mining, financial engineering, and intelligent agents, pp 89–99Google Scholar
  18. Hu K, Lu Y, Shi C (2003) Feature ranking in rough sets. AI Commun 16:41–50zbMATHGoogle Scholar
  19. Huerta E, Duval B, Hao J (2008) Gene selection for microarray data by a LDA-based genetic algorithm. In: IAPR international conference on pattern recognition in bioinformatics. Springer, Berlin, Heidelberg, pp 250–261Google Scholar
  20. Inbarani H, Azar A, Jothi G (2014) Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. Comput methods programs 113:175–185CrossRefGoogle Scholar
  21. Inbarani H, Bagyamathi M, Azar A (2015a) A novel hybrid feature selection method based on rough set and improved harmony search. Neural Comput Appl 26(8):1859–1880CrossRefGoogle Scholar
  22. Inbarani HH, Bagyamathi M, Azar AT (2015b) A novel hybrid feature selection method based on rough set and improved harmony search. Neural Comput Appl 26:1859–1880.  https://doi.org/10.1007/s00521-015-1840-0 CrossRefGoogle Scholar
  23. Jiang F, Sui Y, Zhou L (2015) A relative decision entropy-based feature selection approach. Pattern Recognit 48:2151–2163.  https://doi.org/10.1016/j.patcog.2015.01.023 CrossRefzbMATHGoogle Scholar
  24. Kannan K, Kanna BR, Aravindan C (2010) Root Mean Square filter for noisy images based on hyper graph model. Image Vis Comput 28:1329–1338.  https://doi.org/10.1016/j.imavis.2010.01.013 CrossRefGoogle Scholar
  25. Kavvadias D, Stavropoulos E (2005) An efficient algorithm for the transversal hypergraph generation. J Graph Algorithms Appl 9:239–264MathSciNetCrossRefGoogle Scholar
  26. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324.  https://doi.org/10.1016/S0004-3702(97)00043-X CrossRefzbMATHGoogle Scholar
  27. Li S, Wu X, Tan M (2008) Gene selection using hybrid particle swarm optimization and genetic algorithm. Soft Comput 12:1039–1048.  https://doi.org/10.1007/s00500-007-0272-x CrossRefGoogle Scholar
  28. Lu H, Chen J, Yan K et al (2017) A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256:56–62.  https://doi.org/10.1016/j.neucom.2016.07.080 CrossRefGoogle Scholar
  29. Medjahed SA, Saadi TA, Benyettou A, Ouali M (2017) Kernel-based learning and feature selection analysis for cancer diagnosis. Appl Soft Comput 51:39–48.  https://doi.org/10.1016/j.asoc.2016.12.010 CrossRefGoogle Scholar
  30. Moteghaed NY, Maghooli K, Pirhadi S, Garshasbi M (2015) Biomarker discovery based on hybrid optimization algorithm and artificial neural networks on microarray data for cancer classification. J Med Signals Sens 5:88–96CrossRefGoogle Scholar
  31. Øhrn A, Komorowski J (1997) Rosetta–a rough set toolkit for analysis of data. In: Third international joint conference on information sciences, pp 403–407Google Scholar
  32. Pashaei E, Aydin N (2017) Binary black hole algorithm for feature selection and classification on biological data. Appl Soft Comput 56:94–106.  https://doi.org/10.1016/j.asoc.2017.03.002 CrossRefGoogle Scholar
  33. Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11:341–356CrossRefGoogle Scholar
  34. Pawlak Z (1998) Rough set theory and its applications to data analysis. Cybern Syst 29:661–688CrossRefGoogle Scholar
  35. Pölsterl S, Conjeti S, Navab N, Katouzian A (2016) Survival analysis for high-dimensional, heterogeneous medical data: exploring feature extraction as an alternative to feature selection. Artif Intell Med 72:1–11.  https://doi.org/10.1016/j.artmed.2016.07.004 CrossRefGoogle Scholar
  36. Raman MRG, Kannan K, Pal SK, Shankar Sriram VS (2016) Rough set-hypergraph-based feature selection approach for intrusion detection systems. Def Sci J 66:612–617.  https://doi.org/10.14429/dsj.66.10802 CrossRefGoogle Scholar
  37. Raman MRG, Somu N, Kirthivasan K, Sriram VSS (2017) A hypergraph and arithmetic residue-based probabilistic neural network for classification in intrusion detection systems. Neural Netw 92:89–97.  https://doi.org/10.1016/j.neunet.2017.01.012 CrossRefGoogle Scholar
  38. Sahu B, Mishra D (2012) A novel feature selection algorithm using particle swarm optimization for cancer microarray data. Procedia Eng 38:27–31.  https://doi.org/10.1016/j.proeng.2012.06.005 CrossRefGoogle Scholar
  39. Sánchez-Maroño N, Alonso-Betanzos A (2007) Filter methods for feature selection–a comparative study. In: International conference on intelligent data engineering and automated learning. Springer, Berlin, Heidelberg, pp 178–187Google Scholar
  40. Sohrabi MK, Tajik A (2017) Multi-objective feature selection for warfarin dose prediction. Comput Biol Chem 69:126–133.  https://doi.org/10.1016/j.compbiolchem.2017.06.002 CrossRefGoogle Scholar
  41. Somu N, Raman MRG, Kirthivasan K, Sriram VSS (2016) Hypergraph based feature selection technique for medical diagnosis. J Med Syst 40:239.  https://doi.org/10.1007/s10916-016-0600-8 CrossRefGoogle Scholar
  42. Somu N, Kirthivasan K, Shankar Sriram VS (2017) A rough set-based hypergraph trust measure parameter selection technique for cloud service selection. J Supercomput.  https://doi.org/10.1007/s11227-017-2032-8 CrossRefGoogle Scholar
  43. Somu N, Gauthama Raman MR, Kalpana V, Krithivasan K, Shankar Sriram VS (2018) An improved robust heteroscedastic probabilistic neural network based trust prediction approach for cloud service selection. Neural Networks 108:339–354.  https://doi.org/10.1016/j.neunet.2018.08.005 CrossRefGoogle Scholar
  44. Somu N, Gauthama Raman MR, Obulaporam G, Krithivasan K, Shankar Sriram VS (2019) An improved rough set approach for optimal trust measure parameter selection in cloud environments. Soft Comput.  https://doi.org/10.1007/s00500-018-03753-y CrossRefGoogle Scholar
  45. Wang X, Gotoh O (2009) Microarray-based cancer prediction using soft computing approach. 7:123–139Google Scholar
  46. Wang G, Yu H, Yang D (2002) Decision table reduction based on conditional information entropy. Chinese J Comput Ed 25:759–766MathSciNetGoogle Scholar
  47. Wang X, Yang J, Teng X, Weijun Xia RJ (2007) Feature selection based on rough sets and particle swarm optimization. Pattern Recognit Lett 28:459–471.  https://doi.org/10.1016/j.patrec.2006.09.003 CrossRefGoogle Scholar
  48. Witten I, Frank E, Hall M, Pal C (2016) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan KaufmannGoogle Scholar
  49. Wroblewski J (1995) Finding minimal reducts using genetic algorithms. In: Proccedings of the second annual join conference on infromation science, pp 186–189Google Scholar
  50. Zhu Z, Ong Y-S, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit 40:3236–3248.  https://doi.org/10.1016/j.patcog.2007.02.007 CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Centre for Information Super Highway (CISH), School of ComputingSASTRA Deemed to be UniversityThanjavurIndia
  2. 2.Data Science Laboratory, Department of MathematicsSASTRA Deemed to be UniversityThanjavurIndia
  3. 3.iTrust, Centre for Research in Cyber SecuritySingapore University of Technology and DesignSingaporeSingapore
  4. 4.Smart Energy Informatics Lab (SEIL), Department of Computer Science and EngineeringIndian Institute of Technology-BombayMumbaiIndia

Personalised recommendations