Abstract
Improved imputation has a major role in the research of data pre-process for data analysis. The missing value treatment is implemented with many of the traditional approaches, such as attribute mean/mode, cluster-based mean/mode substitution. In these approaches, the major concentration is missing valued attribute. This paper presents a framework for correlated cluster-based imputation to improve the quality of data for data mining applications. We make use the correlation analysis on data set with respect to missing data attributes. Based on highly correlated attributes, the data set is divided into clusters using suitable clustering techniques and imputes the missing content with respect to cluster mean value. This correlated cluster-based imputation improves the quality of data. The imputed data are analyzed with K-Nearest Neighbor (KNN) and J48 Decision Tree multi-class classifiers. The efficiency of imputation is ascertaining 100 % accuracy with correlated cluster mean imputed data compared with attribute mean imputed data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lakshminarayan K, Harp S. A., and Samad T: Imputation of Missing Data in Industrial Databases: Applied Intelligence, 11: pp. 259–275 (1999).
Pearson, R. K.: The problem of disguised missing data: SIGKDD Explor. Newsl. Vol.8, no.1, pp. 83–92 (2006).
Grzymala-Busse J. W. and Hu. M.: A Comparison of Several Approaches to Missing Attribute Values in Data Mining: In RSCTC’2000, pp. 340–347 (2000).
Hua, M. and Pei, J.: Cleaning disguised missing data: a heuristic approach: In Proceedings of the 13th ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, KDD ‘07. ACM, New York, NY, pp. 950–958, (2007).
Calders, T., Goethals, B., and Mampaey, M.: Mining itemsets in the presence of missing values: In Proceedings of the 2007 ACM Symposium on Applied Computin (2007).
Clark and Niblett . T.: The CN2 Induction Algorithm: Machine Learning, 3(4):261–283 (1989).
Quinlan J. R.: C4.5 Programs for Machine Learning: Morgan Kaufmann, CA (1988).
Batista G. E. A. P. A. and Monard M. C.: K-Nearest Neighbour as Imputation Method: Experimental Results. Technical report, ICMC-USP (2002).
Nuryazmin Ahmat Zainuri, et al.: A Comparison of Various Methods for Missing Values in Air Quality Data: Sains Malaysiana 44(3), :449–456; (2015).
Peng Shangu, Wang Xiwu, Zhong: The study of EM algorithm based on forward sampling: Qigen Electronics, Communications and Control (ICECC), pp. 4597 (2011).
Wagstaff, K., Cardie, C., Rogers, S. and Schroedl, S.: Constrained k-means clustering with background knowledge: In Proc. of the 18th Intl. Conf. on Machine Learning, pp. 577–584 (2001).
Madhu Bala Myneni, M. Seetha: Comparative Analysis on Scene Image Classification using Selected Hybrid Features: International Journal of Computer Applications (0975–8887) vol. 63, no.2, pp. 44–47 (2013).
Acknowledgments
Authors gratefully acknowledge the computational facility created in the college under DST’s FIST Programme (SR/FST/College-2009/2014(C)) which helped them to carry out the work. The authors are also grateful to the anonymous reviewers for their constructive comments which improved the quality of the paper. Authors thank the management of VBIT for their support and kind encouragement.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Singapore
About this paper
Cite this paper
Myneni, M.B., Srividya, Y., Dandamudi, A. (2017). Correlated Cluster-Based Imputation for Treatment of Missing Values. In: Satapathy, S., Prasad, V., Rani, B., Udgata, S., Raju, K. (eds) Proceedings of the First International Conference on Computational Intelligence and Informatics . Advances in Intelligent Systems and Computing, vol 507. Springer, Singapore. https://doi.org/10.1007/978-981-10-2471-9_17
Download citation
DOI: https://doi.org/10.1007/978-981-10-2471-9_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2470-2
Online ISBN: 978-981-10-2471-9
eBook Packages: EngineeringEngineering (R0)