Skip to main content

Missing Values and Class Prediction Based on Mutual Information and Supervised Similarity

  • Conference paper
  • First Online:
Proceedings of International Conference on Artificial Intelligence, Smart Grid and Smart City Applications (AISGSC 2019 2019)

Abstract

In recent times, the uses of data mining techniques have increased tremendously due to the increase in a large amount of data. Data mining techniques have been used for many research purposes. But mostly, they all face a single unique problem and that is the missing values of data. During research, large datasets are taken as processed for experimentation of algorithms, and if there is a missing value, these instances are either ignored or any default values are replaced during pre-processing of data. But this way is not correct. In this chapter, a novel prediction technique is proposed that can be used to predict the missing values of a given dataset or a dataset sample by calculating the mutual information, supervised similarity, and cosine similarity. The proposed approach calculated the missing values accurately, and this is experimented using a sample cancer dataset with missing gene values. The proposed prediction technique can also be used to predict class values of new instances of dataset. The experimentation shows that the predicted missing values and class labels coincide with the existing gene subsets and are said to be reliable and accurate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ABC:

Artificial bee colony

ACM:

Association of computing machinery

GF:

Gaussian function

k-NN:

K-nearest neighbor

MOABC:

Multi-objective artificial bee colony

PDF:

Probability density function

SVM:

Support vector machine

References

  1. Padmapriya B, Velmurugan T (2014) A survey on breast Cancer analysis using data mining techniques. IEEE international conference on computational intelligence and research, December 2014, pp 1–4

    Google Scholar 

  2. Ang JC, Mirzal A, Haron H, Hamed HNA (2015) Supervised, unsupervised and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans Comput Biol Bioinform PP(99):1–1

    Google Scholar 

  3. Liu J-X, Xu Y, Zheng C-H, Kong H, Lai Z-H (2015) RPCA-based tumor classification using gene expression data. IEEE/ACM Trans Comput Biol Bioinform 12(4):964–970

    Article  Google Scholar 

  4. Tang J, Zhou S (2016) A new approach for feature selection from microarray data based on mutual information. IEEE/ACM Trans Comput Biol Bioinform PP(99):1–1

    Google Scholar 

  5. Motai Y (2015) Kernal association for classification and prediction: a survey. IEEE Trans Neural Netw Learn Syst 26(2):208–223

    Article  MathSciNet  Google Scholar 

  6. Bose S, Das C, Dutta S, Chattopadhyay S (2012) A novel interpolation based missing value estimation method to predict missing values in microarray gene Expperession data. IEEE international conference on communications, devices and intelligent systems, December 2012, pp 318–321

    Google Scholar 

  7. Pei Z, Zhou Y, Liu L, Wang L (2010) A mutual information and information entropy pair based feature selection method in text classification. IEEE international conference on computer application and system Modeling, October 2010, pp 258–261

    Google Scholar 

  8. Tian J, Wang Q, Bing Y, Dan Y (2013) A rough set algorithm for attribute reduction via mutual information and conditional entropy. IEEE 10th international conference on fuzzy systems and knowledge discovery, July 2013, pp 5667–571

    Google Scholar 

  9. Hance E, Xue B, Zhang M, Karaboga D (2015) A multi-objective artificial bee Colony approach to feature selection using fuzzy mutual information. IEEE congress on evolutionary computation, May 2015, pp 2420–2427

    Google Scholar 

  10. Tsai Y-S, Yang U-C, Chung I-F, Huang C-D (2013) A comparison of mutual and fuzzy-mutual information-based feature selection strategies. IEEE international conference on fuzzy systems, July 2013, pp 1–6

    Google Scholar 

  11. Shu W, Qian W (2014) Mutual information-based feature selection from set-valued data. IEEE 26th international conference on tools with artificial intelligence, November 2014, pp733–739

    Google Scholar 

  12. Sehhati M, Mehridehnavi A, Rabbani H, Pourhossein M (2015) Stable gene signature selection for prediction of breast Cancer recurrence using joint mutual information. IEEE/ACM Trans Comput Biol Bioinform 12(6):1440–1448

    Article  Google Scholar 

  13. Maji P (2009) F-information measures for efficient selection of discriminative genes from microarray data. IEEE Trans Biomed Eng 56(4):1063–1069

    Article  Google Scholar 

  14. Weitschek E, Felici G, Bertolazzi P (2013) Clinical data mining: problems, pitfalls and solutions. IEEE 24th international workshop on database and expert systems applications, August 2013, pp 90–94

    Google Scholar 

  15. Ebrahimpour M, Mahmoodian H, Ghayour R (2013) Maximum correlation minimum redundancy in weighted gene selection. IEEE international conference on electronics, computer and computation, November 2013, pp 44–47

    Google Scholar 

  16. Maji P, Das C (2012) Relevent and significant supervised gene clusters for microarray Cancer classification. IEEE Trans Nano Biosci 11(2):161–168

    Article  Google Scholar 

  17. Dukkipati A, Pandey G, Ghoshdastidar D, Koley P, Sriram DMVS (2013) Generative maximum entropy learning for multiclass classification. IEEE 13th international conference on data mining, December 2013, pp 141–150

    Google Scholar 

  18. Alnemer LM, Al-Azzam O, Chitraranjan C, Denton AM, Bassi FM, Iqbal MJ, Kianian SF (2011) Multiple sources classification of gene position on chromosomes using statistical significance of individual classification results. IEEE 10th international conference on machine learning and applications and workshops, December 2011, pp 7–12

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

K., N., Suriya, S. (2020). Missing Values and Class Prediction Based on Mutual Information and Supervised Similarity. In: Kumar, L., Jayashree, L., Manimegalai, R. (eds) Proceedings of International Conference on Artificial Intelligence, Smart Grid and Smart City Applications. AISGSC 2019 2019. Springer, Cham. https://doi.org/10.1007/978-3-030-24051-6_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-24051-6_54

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-24050-9

  • Online ISBN: 978-3-030-24051-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics