Missing Values and Class Prediction Based on Mutual Information and Supervised Similarity

  • Nagalakshmi K.
  • S. Suriya
Conference paper


In recent times, the uses of data mining techniques have increased tremendously due to the increase in a large amount of data. Data mining techniques have been used for many research purposes. But mostly, they all face a single unique problem and that is the missing values of data. During research, large datasets are taken as processed for experimentation of algorithms, and if there is a missing value, these instances are either ignored or any default values are replaced during pre-processing of data. But this way is not correct. In this chapter, a novel prediction technique is proposed that can be used to predict the missing values of a given dataset or a dataset sample by calculating the mutual information, supervised similarity, and cosine similarity. The proposed approach calculated the missing values accurately, and this is experimented using a sample cancer dataset with missing gene values. The proposed prediction technique can also be used to predict class values of new instances of dataset. The experimentation shows that the predicted missing values and class labels coincide with the existing gene subsets and are said to be reliable and accurate.


Mining techniques Class prediction Supervised similarity Mutual information Missing value prediction 



Artificial bee colony


Association of computing machinery


Gaussian function


K-nearest neighbor


Multi-objective artificial bee colony


Probability density function


Support vector machine


  1. 1.
    Padmapriya B, Velmurugan T (2014) A survey on breast Cancer analysis using data mining techniques. IEEE international conference on computational intelligence and research, December 2014, pp 1–4Google Scholar
  2. 2.
    Ang JC, Mirzal A, Haron H, Hamed HNA (2015) Supervised, unsupervised and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans Comput Biol Bioinform PP(99):1–1Google Scholar
  3. 3.
    Liu J-X, Xu Y, Zheng C-H, Kong H, Lai Z-H (2015) RPCA-based tumor classification using gene expression data. IEEE/ACM Trans Comput Biol Bioinform 12(4):964–970CrossRefGoogle Scholar
  4. 4.
    Tang J, Zhou S (2016) A new approach for feature selection from microarray data based on mutual information. IEEE/ACM Trans Comput Biol Bioinform PP(99):1–1Google Scholar
  5. 5.
    Motai Y (2015) Kernal association for classification and prediction: a survey. IEEE Trans Neural Netw Learn Syst 26(2):208–223MathSciNetCrossRefGoogle Scholar
  6. 6.
    Bose S, Das C, Dutta S, Chattopadhyay S (2012) A novel interpolation based missing value estimation method to predict missing values in microarray gene Expperession data. IEEE international conference on communications, devices and intelligent systems, December 2012, pp 318–321Google Scholar
  7. 7.
    Pei Z, Zhou Y, Liu L, Wang L (2010) A mutual information and information entropy pair based feature selection method in text classification. IEEE international conference on computer application and system Modeling, October 2010, pp 258–261Google Scholar
  8. 8.
    Tian J, Wang Q, Bing Y, Dan Y (2013) A rough set algorithm for attribute reduction via mutual information and conditional entropy. IEEE 10th international conference on fuzzy systems and knowledge discovery, July 2013, pp 5667–571Google Scholar
  9. 9.
    Hance E, Xue B, Zhang M, Karaboga D (2015) A multi-objective artificial bee Colony approach to feature selection using fuzzy mutual information. IEEE congress on evolutionary computation, May 2015, pp 2420–2427Google Scholar
  10. 10.
    Tsai Y-S, Yang U-C, Chung I-F, Huang C-D (2013) A comparison of mutual and fuzzy-mutual information-based feature selection strategies. IEEE international conference on fuzzy systems, July 2013, pp 1–6Google Scholar
  11. 11.
    Shu W, Qian W (2014) Mutual information-based feature selection from set-valued data. IEEE 26th international conference on tools with artificial intelligence, November 2014, pp733–739Google Scholar
  12. 12.
    Sehhati M, Mehridehnavi A, Rabbani H, Pourhossein M (2015) Stable gene signature selection for prediction of breast Cancer recurrence using joint mutual information. IEEE/ACM Trans Comput Biol Bioinform 12(6):1440–1448CrossRefGoogle Scholar
  13. 13.
    Maji P (2009) F-information measures for efficient selection of discriminative genes from microarray data. IEEE Trans Biomed Eng 56(4):1063–1069CrossRefGoogle Scholar
  14. 14.
    Weitschek E, Felici G, Bertolazzi P (2013) Clinical data mining: problems, pitfalls and solutions. IEEE 24th international workshop on database and expert systems applications, August 2013, pp 90–94Google Scholar
  15. 15.
    Ebrahimpour M, Mahmoodian H, Ghayour R (2013) Maximum correlation minimum redundancy in weighted gene selection. IEEE international conference on electronics, computer and computation, November 2013, pp 44–47Google Scholar
  16. 16.
    Maji P, Das C (2012) Relevent and significant supervised gene clusters for microarray Cancer classification. IEEE Trans Nano Biosci 11(2):161–168CrossRefGoogle Scholar
  17. 17.
    Dukkipati A, Pandey G, Ghoshdastidar D, Koley P, Sriram DMVS (2013) Generative maximum entropy learning for multiclass classification. IEEE 13th international conference on data mining, December 2013, pp 141–150Google Scholar
  18. 18.
    Alnemer LM, Al-Azzam O, Chitraranjan C, Denton AM, Bassi FM, Iqbal MJ, Kianian SF (2011) Multiple sources classification of gene position on chromosomes using statistical significance of individual classification results. IEEE 10th international conference on machine learning and applications and workshops, December 2011, pp 7–12Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Nagalakshmi K.
    • 1
  • S. Suriya
    • 2
  1. 1.Sethu Institute of TechnologyVirudhunagarIndia
  2. 2.Department of Computer Science and EngineeringPSG College of TechnologyCoimbatoreIndia

Personalised recommendations