Skip to main content

Correlated Cluster-Based Imputation for Treatment of Missing Values

  • Conference paper
  • First Online:
Book cover Proceedings of the First International Conference on Computational Intelligence and Informatics

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 507))

Abstract

Improved imputation has a major role in the research of data pre-process for data analysis. The missing value treatment is implemented with many of the traditional approaches, such as attribute mean/mode, cluster-based mean/mode substitution. In these approaches, the major concentration is missing valued attribute. This paper presents a framework for correlated cluster-based imputation to improve the quality of data for data mining applications. We make use the correlation analysis on data set with respect to missing data attributes. Based on highly correlated attributes, the data set is divided into clusters using suitable clustering techniques and imputes the missing content with respect to cluster mean value. This correlated cluster-based imputation improves the quality of data. The imputed data are analyzed with K-Nearest Neighbor (KNN) and J48 Decision Tree multi-class classifiers. The efficiency of imputation is ascertaining 100 % accuracy with correlated cluster mean imputed data compared with attribute mean imputed data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lakshminarayan K, Harp S. A., and Samad T: Imputation of Missing Data in Industrial Databases: Applied Intelligence, 11: pp. 259–275 (1999).

    Google Scholar 

  2. Pearson, R. K.: The problem of disguised missing data: SIGKDD Explor. Newsl. Vol.8, no.1, pp. 83–92 (2006).

    Google Scholar 

  3. Grzymala-Busse J. W. and Hu. M.: A Comparison of Several Approaches to Missing Attribute Values in Data Mining: In RSCTC’2000, pp. 340–347 (2000).

    Google Scholar 

  4. Hua, M. and Pei, J.: Cleaning disguised missing data: a heuristic approach: In Proceedings of the 13th ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, KDD ‘07. ACM, New York, NY, pp. 950–958, (2007).

    Google Scholar 

  5. Calders, T., Goethals, B., and Mampaey, M.: Mining itemsets in the presence of missing values: In Proceedings of the 2007 ACM Symposium on Applied Computin (2007).

    Google Scholar 

  6. Clark and Niblett . T.: The CN2 Induction Algorithm: Machine Learning, 3(4):261–283 (1989).

    Google Scholar 

  7. Quinlan J. R.: C4.5 Programs for Machine Learning: Morgan Kaufmann, CA (1988).

    Google Scholar 

  8. Batista G. E. A. P. A. and Monard M. C.: K-Nearest Neighbour as Imputation Method: Experimental Results. Technical report, ICMC-USP (2002).

    Google Scholar 

  9. Nuryazmin Ahmat Zainuri, et al.: A Comparison of Various Methods for Missing Values in Air Quality Data: Sains Malaysiana 44(3), :449–456; (2015).

    Google Scholar 

  10. Peng Shangu, Wang Xiwu, Zhong: The study of EM algorithm based on forward sampling: Qigen Electronics, Communications and Control (ICECC), pp. 4597 (2011).

    Google Scholar 

  11. Wagstaff, K., Cardie, C., Rogers, S. and Schroedl, S.: Constrained k-means clustering with background knowledge: In Proc. of the 18th Intl. Conf. on Machine Learning, pp. 577–584 (2001).

    Google Scholar 

  12. Madhu Bala Myneni, M. Seetha: Comparative Analysis on Scene Image Classification using Selected Hybrid Features: International Journal of Computer Applications (0975–8887) vol. 63, no.2, pp. 44–47 (2013).

    Google Scholar 

Download references

Acknowledgments

Authors gratefully acknowledge the computational facility created in the college under DST’s FIST Programme (SR/FST/College-2009/2014(C)) which helped them to carry out the work. The authors are also grateful to the anonymous reviewers for their constructive comments which improved the quality of the paper. Authors thank the management of VBIT for their support and kind encouragement.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Madhu Bala Myneni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Singapore

About this paper

Cite this paper

Myneni, M.B., Srividya, Y., Dandamudi, A. (2017). Correlated Cluster-Based Imputation for Treatment of Missing Values. In: Satapathy, S., Prasad, V., Rani, B., Udgata, S., Raju, K. (eds) Proceedings of the First International Conference on Computational Intelligence and Informatics . Advances in Intelligent Systems and Computing, vol 507. Springer, Singapore. https://doi.org/10.1007/978-981-10-2471-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2471-9_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2470-2

  • Online ISBN: 978-981-10-2471-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics