Abstract
In recent times, the uses of data mining techniques have increased tremendously due to the increase in a large amount of data. Data mining techniques have been used for many research purposes. But mostly, they all face a single unique problem and that is the missing values of data. During research, large datasets are taken as processed for experimentation of algorithms, and if there is a missing value, these instances are either ignored or any default values are replaced during pre-processing of data. But this way is not correct. In this chapter, a novel prediction technique is proposed that can be used to predict the missing values of a given dataset or a dataset sample by calculating the mutual information, supervised similarity, and cosine similarity. The proposed approach calculated the missing values accurately, and this is experimented using a sample cancer dataset with missing gene values. The proposed prediction technique can also be used to predict class values of new instances of dataset. The experimentation shows that the predicted missing values and class labels coincide with the existing gene subsets and are said to be reliable and accurate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- ABC:
-
Artificial bee colony
- ACM:
-
Association of computing machinery
- GF:
-
Gaussian function
- k-NN:
-
K-nearest neighbor
- MOABC:
-
Multi-objective artificial bee colony
- PDF:
-
Probability density function
- SVM:
-
Support vector machine
References
Padmapriya B, Velmurugan T (2014) A survey on breast Cancer analysis using data mining techniques. IEEE international conference on computational intelligence and research, December 2014, pp 1–4
Ang JC, Mirzal A, Haron H, Hamed HNA (2015) Supervised, unsupervised and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans Comput Biol Bioinform PP(99):1–1
Liu J-X, Xu Y, Zheng C-H, Kong H, Lai Z-H (2015) RPCA-based tumor classification using gene expression data. IEEE/ACM Trans Comput Biol Bioinform 12(4):964–970
Tang J, Zhou S (2016) A new approach for feature selection from microarray data based on mutual information. IEEE/ACM Trans Comput Biol Bioinform PP(99):1–1
Motai Y (2015) Kernal association for classification and prediction: a survey. IEEE Trans Neural Netw Learn Syst 26(2):208–223
Bose S, Das C, Dutta S, Chattopadhyay S (2012) A novel interpolation based missing value estimation method to predict missing values in microarray gene Expperession data. IEEE international conference on communications, devices and intelligent systems, December 2012, pp 318–321
Pei Z, Zhou Y, Liu L, Wang L (2010) A mutual information and information entropy pair based feature selection method in text classification. IEEE international conference on computer application and system Modeling, October 2010, pp 258–261
Tian J, Wang Q, Bing Y, Dan Y (2013) A rough set algorithm for attribute reduction via mutual information and conditional entropy. IEEE 10th international conference on fuzzy systems and knowledge discovery, July 2013, pp 5667–571
Hance E, Xue B, Zhang M, Karaboga D (2015) A multi-objective artificial bee Colony approach to feature selection using fuzzy mutual information. IEEE congress on evolutionary computation, May 2015, pp 2420–2427
Tsai Y-S, Yang U-C, Chung I-F, Huang C-D (2013) A comparison of mutual and fuzzy-mutual information-based feature selection strategies. IEEE international conference on fuzzy systems, July 2013, pp 1–6
Shu W, Qian W (2014) Mutual information-based feature selection from set-valued data. IEEE 26th international conference on tools with artificial intelligence, November 2014, pp733–739
Sehhati M, Mehridehnavi A, Rabbani H, Pourhossein M (2015) Stable gene signature selection for prediction of breast Cancer recurrence using joint mutual information. IEEE/ACM Trans Comput Biol Bioinform 12(6):1440–1448
Maji P (2009) F-information measures for efficient selection of discriminative genes from microarray data. IEEE Trans Biomed Eng 56(4):1063–1069
Weitschek E, Felici G, Bertolazzi P (2013) Clinical data mining: problems, pitfalls and solutions. IEEE 24th international workshop on database and expert systems applications, August 2013, pp 90–94
Ebrahimpour M, Mahmoodian H, Ghayour R (2013) Maximum correlation minimum redundancy in weighted gene selection. IEEE international conference on electronics, computer and computation, November 2013, pp 44–47
Maji P, Das C (2012) Relevent and significant supervised gene clusters for microarray Cancer classification. IEEE Trans Nano Biosci 11(2):161–168
Dukkipati A, Pandey G, Ghoshdastidar D, Koley P, Sriram DMVS (2013) Generative maximum entropy learning for multiclass classification. IEEE 13th international conference on data mining, December 2013, pp 141–150
Alnemer LM, Al-Azzam O, Chitraranjan C, Denton AM, Bassi FM, Iqbal MJ, Kianian SF (2011) Multiple sources classification of gene position on chromosomes using statistical significance of individual classification results. IEEE 10th international conference on machine learning and applications and workshops, December 2011, pp 7–12
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
K., N., Suriya, S. (2020). Missing Values and Class Prediction Based on Mutual Information and Supervised Similarity. In: Kumar, L., Jayashree, L., Manimegalai, R. (eds) Proceedings of International Conference on Artificial Intelligence, Smart Grid and Smart City Applications. AISGSC 2019 2019. Springer, Cham. https://doi.org/10.1007/978-3-030-24051-6_54
Download citation
DOI: https://doi.org/10.1007/978-3-030-24051-6_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24050-9
Online ISBN: 978-3-030-24051-6
eBook Packages: EngineeringEngineering (R0)