Missing Values and Class Prediction Based on Mutual Information and Supervised Similarity

K., Nagalakshmi; Suriya, S.

doi:10.1007/978-3-030-24051-6_54

Nagalakshmi K.⁴ &
S. Suriya⁵

Included in the following conference series:

International Conference on Artificial Intelligence, Smart Grid and Smart City Applications

1286 Accesses

Abstract

In recent times, the uses of data mining techniques have increased tremendously due to the increase in a large amount of data. Data mining techniques have been used for many research purposes. But mostly, they all face a single unique problem and that is the missing values of data. During research, large datasets are taken as processed for experimentation of algorithms, and if there is a missing value, these instances are either ignored or any default values are replaced during pre-processing of data. But this way is not correct. In this chapter, a novel prediction technique is proposed that can be used to predict the missing values of a given dataset or a dataset sample by calculating the mutual information, supervised similarity, and cosine similarity. The proposed approach calculated the missing values accurately, and this is experimented using a sample cancer dataset with missing gene values. The proposed prediction technique can also be used to predict class values of new instances of dataset. The experimentation shows that the predicted missing values and class labels coincide with the existing gene subsets and are said to be reliable and accurate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ABC:: Artificial bee colony
ACM:: Association of computing machinery
GF:: Gaussian function
k-NN:: K-nearest neighbor
MOABC:: Multi-objective artificial bee colony
PDF:: Probability density function
SVM:: Support vector machine

References

Padmapriya B, Velmurugan T (2014) A survey on breast Cancer analysis using data mining techniques. IEEE international conference on computational intelligence and research, December 2014, pp 1–4
Google Scholar
Ang JC, Mirzal A, Haron H, Hamed HNA (2015) Supervised, unsupervised and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans Comput Biol Bioinform PP(99):1–1
Google Scholar
Liu J-X, Xu Y, Zheng C-H, Kong H, Lai Z-H (2015) RPCA-based tumor classification using gene expression data. IEEE/ACM Trans Comput Biol Bioinform 12(4):964–970
Article Google Scholar
Tang J, Zhou S (2016) A new approach for feature selection from microarray data based on mutual information. IEEE/ACM Trans Comput Biol Bioinform PP(99):1–1
Google Scholar
Motai Y (2015) Kernal association for classification and prediction: a survey. IEEE Trans Neural Netw Learn Syst 26(2):208–223
Article MathSciNet Google Scholar
Bose S, Das C, Dutta S, Chattopadhyay S (2012) A novel interpolation based missing value estimation method to predict missing values in microarray gene Expperession data. IEEE international conference on communications, devices and intelligent systems, December 2012, pp 318–321
Google Scholar
Pei Z, Zhou Y, Liu L, Wang L (2010) A mutual information and information entropy pair based feature selection method in text classification. IEEE international conference on computer application and system Modeling, October 2010, pp 258–261
Google Scholar
Tian J, Wang Q, Bing Y, Dan Y (2013) A rough set algorithm for attribute reduction via mutual information and conditional entropy. IEEE 10th international conference on fuzzy systems and knowledge discovery, July 2013, pp 5667–571
Google Scholar
Hance E, Xue B, Zhang M, Karaboga D (2015) A multi-objective artificial bee Colony approach to feature selection using fuzzy mutual information. IEEE congress on evolutionary computation, May 2015, pp 2420–2427
Google Scholar
Tsai Y-S, Yang U-C, Chung I-F, Huang C-D (2013) A comparison of mutual and fuzzy-mutual information-based feature selection strategies. IEEE international conference on fuzzy systems, July 2013, pp 1–6
Google Scholar
Shu W, Qian W (2014) Mutual information-based feature selection from set-valued data. IEEE 26th international conference on tools with artificial intelligence, November 2014, pp733–739
Google Scholar
Sehhati M, Mehridehnavi A, Rabbani H, Pourhossein M (2015) Stable gene signature selection for prediction of breast Cancer recurrence using joint mutual information. IEEE/ACM Trans Comput Biol Bioinform 12(6):1440–1448
Article Google Scholar
Maji P (2009) F-information measures for efficient selection of discriminative genes from microarray data. IEEE Trans Biomed Eng 56(4):1063–1069
Article Google Scholar
Weitschek E, Felici G, Bertolazzi P (2013) Clinical data mining: problems, pitfalls and solutions. IEEE 24th international workshop on database and expert systems applications, August 2013, pp 90–94
Google Scholar
Ebrahimpour M, Mahmoodian H, Ghayour R (2013) Maximum correlation minimum redundancy in weighted gene selection. IEEE international conference on electronics, computer and computation, November 2013, pp 44–47
Google Scholar
Maji P, Das C (2012) Relevent and significant supervised gene clusters for microarray Cancer classification. IEEE Trans Nano Biosci 11(2):161–168
Article Google Scholar
Dukkipati A, Pandey G, Ghoshdastidar D, Koley P, Sriram DMVS (2013) Generative maximum entropy learning for multiclass classification. IEEE 13th international conference on data mining, December 2013, pp 141–150
Google Scholar
Alnemer LM, Al-Azzam O, Chitraranjan C, Denton AM, Bassi FM, Iqbal MJ, Kianian SF (2011) Multiple sources classification of gene position on chromosomes using statistical significance of individual classification results. IEEE 10th international conference on machine learning and applications and workshops, December 2011, pp 7–12
Google Scholar

Download references

Author information

Authors and Affiliations

Sethu Institute of Technology, Virudhunagar, Tamil Nadu, India
Nagalakshmi K.
Department of Computer Science and Engineering, PSG College of Technology, Coimbatore, Tamil Nadu, India
S. Suriya

Authors

Nagalakshmi K.
View author publications
You can also search for this author in PubMed Google Scholar
S. Suriya
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Electronics Engineering, PSG College of Technology, Coimbatore, Tamil Nadu, India
L. Ashok Kumar
Department of Computer Science and Engineering, PSG College of Technology, Coimbatore, Tamil Nadu, India
L. S. Jayashree
Department of Information Technology, PSG College of Technology, Coimbatore, Tamil Nadu, India
R. Manimegalai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

K., N., Suriya, S. (2020). Missing Values and Class Prediction Based on Mutual Information and Supervised Similarity. In: Kumar, L., Jayashree, L., Manimegalai, R. (eds) Proceedings of International Conference on Artificial Intelligence, Smart Grid and Smart City Applications. AISGSC 2019 2019. Springer, Cham. https://doi.org/10.1007/978-3-030-24051-6_54

Download citation

DOI: https://doi.org/10.1007/978-3-030-24051-6_54
Published: 13 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24050-9
Online ISBN: 978-3-030-24051-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics