Integrating multiple methods to enhance medical data classification

  • Balasaheb TarleEmail author
  • Sanjay Chintakindi
  • Sudarson Jena
Original Paper


In medical data classification, data reduction and improving classification performance are the important issues in the current scenario. In existing medical data classification methods, initially, the medical data pre-processing is performed. After pre-processing feature selection is performed, otherwise, the process is more time consuming and has poor accuracy. Here we have proposed two algorithms for enhancing the classification performance on medical data. In first proposed method Bag of Words technique is used for better feature subset selection. Subsequently, the hybrid Fuzzy-Neural Network approach used that can handle imprecision in data while classification. This combination of feature selection technique and Fuzzy-Neural Network classifier approach gives enhanced classification accuracy. In the second proposed algorithm, we have integrated data cleaning technique to improve data quality as pre-processing technique along with bag of words and Fuzzy-Neural Network, this method performs classification on clean filtered data with appropriately reduced feature set that results in more accurate classification than the existing methods. Thus in proposed approaches we have tried to handle three issues, removing noise in data, optimal feature subset selection and handling imprecision in data. The comparative study of various medical datasets in terms of accuracy shows that the two proposed algorithms perform better as compared to existing techniques and the enhancement obtained is around 3% and 17% respectively. In addition the performance of Bag of Words feature selection method used in the proposed system is compared with two feature selection methods LSFS and SFFS.


Classifier Fuzzy-neural network Bag of words Medical data classification 



  1. Ajam N (2015) Heart diseases diagnoses using artificial neural network, business administration college babylon university, network and complex system. ISSN 2224-610X (Paper) ISSN 2225—0603 (Online) Vol. 5, No. 4
  2. Alzubi R, Ramzan N, Alzoubi H, Amira A(2018) A hybrid feature selection method for complex diseases SNPs, IEEE Access, vol. 6, pp 1292–1301Google Scholar
  3. Angelov P, R Yager (2013) Density-based averaging—a new operator for data fusion. Inf Sci 222:163–174MathSciNetCrossRefzbMATHGoogle Scholar
  4. Anooj PK (2012) Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules. Elsevier Comput Inf Sci 24(1):27–40Google Scholar
  5. Baruah RD, P Angelov (2012) Evolving local means method for clustering of streaming data, In: 2012 IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1–8Google Scholar
  6. Dennis B, Muthukrishnan S (2014) AGFS: adaptive genetic fuzzy system for medical data classification. Elsevier Appl Soft Comput 24:242–252CrossRefGoogle Scholar
  7. Do QH, Chen JF (2013) A neuro-fuzzy approach in the classification of students academic performance, Hindawi Publ Corp Comput Intell Neurosci, 2013:1–7Google Scholar
  8. Galathiya S, Ganatra AP, Bhensdadia CK (2012) Improved decision tree induction algorithm with feature selection, cross validation, model complexity, and reduced error pruning, (IJCSIT) Int J Comput Sci Inf Technol, Vol. 3(2):3427–3431Google Scholar
  9. George J, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: Machine learning: proceedings of the eleventh international conference, Rutgers University, New Brunswick, NJ, July 10–13, 1994, pp 121–129Google Scholar
  10. Gorzałczany MB, Rudziński F (2017) Interpretable and accurate medical data classification-a multi-objective genetic-fuzzy optimization approach. Elsevier Expert Syst Appl 71:26–39CrossRefGoogle Scholar
  11. Harb HM, Desuky AS (2014) Feature selection on classification of medical datasets based on particle swarm optimization. Int J Comput Appl 104(5):14–17Google Scholar
  12. Jayanthi SK, Sasikala S (2014) Naive bayesian classifier and PCA for web link spam detection. Comput Sci Telecommun 41(1):3–15Google Scholar
  13. Juhola M, Joutsijoki H, Aalto H, Hirvonen TP (2014) On classification in the case of a medical data set with a complicated distribution. Elsevier Appl Comput Inf 10(2):52–67Google Scholar
  14. Khaleel MA, Pradham SK, Dash GN (2013) A survey of data mining techniques on medical data for finding locally frequent diseases. Int J Adv Res Comput Sci Softw Eng 3(8):149–153Google Scholar
  15. Kharya S (2012) Using data mining techniques for diagnosis and prognosis of cancer disease. Int J Comput Sci Eng Inf Technol 2(2):55–66Google Scholar
  16. Kumar V, Minz S (2014) Feature selection: a literature review. Smart Comput Rev 4:3Google Scholar
  17. Kuncheva LI, Faithfull WJ (2014) PCA feature extraction for change detection in multidimensional unlabeled data. IEEE Trans Neural Netw Learn Syst 25(1):69–80CrossRefGoogle Scholar
  18. Liu Y, Zhang H, Chen M, Zhang L (2016) A boosting-based spatial-spectral model for stroke patients’ EEG analysis in rehabilitation training. IEEE Trans Neural Syst Rehabil Eng 24(1):169–179CrossRefGoogle Scholar
  19. Niranjana Murthy HS, Meenakshi M (2013) Ann model to predict coronary heart disease based on risk factors. Bonfring Int J Man Mach Interface 3(2):13–18CrossRefGoogle Scholar
  20. Park HW, Li D, Piao Y, Ryu KH (2017) A hybrid feature selection method to classification and its application in hypertension diagnosis. In: Bursa M, Holzinger A, Renda M, Khuri S (eds) Information technology in bio- and medical Informatics. ITBAM 2017, vol 10443. Lecture notes in computer science. Springer, ChamCrossRefGoogle Scholar
  21. Patil DV, Bichkar RS (2012) Issues in optimization of decision tree learning: a survey. Int J Appl Inf Syst (IJAIS) 3(5):13–29Google Scholar
  22. Peng Y, Wu Z, Jiang J (2010) A novel feature selection approach for biomedical data classification, school of informatics, university of Bradford. UK J Biomed Inf 43:(2010) 15–23CrossRefGoogle Scholar
  23. Samb ML, Camara F, Ndiaye S, Slimani Y, Esseghir MA (2012) A novel RFE–SVM-based feature selection approach for classification. Int J Adv Sci Technol 43:27–36Google Scholar
  24. Sánchez-Maroño N, Alonso-Betanzos A, Tmobile-Sanromán M (2007) Filter methods for feature selection—a comparative study, intelligent data engineering and automated learning—IDEAL 2007. Lecture notes in computer science, vol 4881. Springer, BerlinGoogle Scholar
  25. Setiawan D, Kusuma WA, Wigena AH (2017), Sequential forward floating selection with two selection criteria, In: 2017 international conference on advanced computer science and information systems (ICACSIS), Bali, pp 395–400Google Scholar
  26. Sharma S, Agrawal J, Agarwal S, Sharma S (2013) Machine learning techniques for data mining: a survey, In: Proceedings of computational intelligence and computing research (ICCIC), IEEE international conference on 26–28 Dec 2013, pp 1–6Google Scholar
  27. Sumalatha G, Muniraj NJR(2013) Survey on Medical Diagnosis Using Data Mining Techniques. In: IEEE proceedings of international conference on optical imaging sensor and security, Coimbatore, Tamil Nadu, India, July 2–3Google Scholar
  28. Tarle B, Jena S (2017a) An artificial neural network based pattern classification algorithm for diagnosis of heart disease. In: IEEE proceedings of international conference on computing, communication, control and automation (ICCUBEA) on 17–18 Aug 2017, Pune. pp 1–4Google Scholar
  29. Tarle B, Jena S (2017b) Improved artificial neural network (ANN) with aid of artificial bee colony (ABC) for medical data classification. Int J Bus Integilince Data Min. Google Scholar
  30. Tomar D, Agarwal S (2013) A survey on data mining approaches for healthcare. Int J Bio-Sci Bio-Technol 5(5):241–266CrossRefGoogle Scholar
  31. Usha Rani K (2011) Analysis of heart disease dataset using neural network approach. IJDKP 1(5):1–8CrossRefGoogle Scholar
  32. Xu S, Dai J, Shi H (2018) Semi-supervised Feature Selection Based on Least Square Regression with Redundancy Minimization, In: 2018 international joint conference on neural networks (IJCNN), Rio de Janeiro, pp 1–8Google Scholar
  33. Yahya AA, Osman A, Ramli AR, Balola A (2011) Feature selection for high dimensional data: an evolutionary filter approach. J Comput Sci 7(5):800–820. CrossRefGoogle Scholar
  34. Yusuke Adachi N, Onimura T, Yamashita SH (2016) Standard measure and SVM measure for feature selection and their performance effect for text classification, In: iiWAS ‘16 ACM proceedings of the 18th international conference on information integration and web-based applications and services Singapore, pp 262–266Google Scholar
  35. Zhao R, Mao K (2018) Fuzzy bag-of-words model for document representation. IEEE Trans Fuzzy Syst 26:794–804CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.CSE DepartmentGITAM (Deemed to be University, Visakhapatnam)HyderabadIndia
  2. 2.School of TechnologyGITAM (Deemed to be University, Visakhapatnam)HyderabadIndia
  3. 3.Computer Engineering and ApplicationsSUIIT, Sambalpur UniversitySambalpurIndia

Personalised recommendations