Correlated Cluster-Based Imputation for Treatment of Missing Values

Myneni, Madhu Bala; Srividya, Y.; Dandamudi, Akhil

doi:10.1007/978-981-10-2471-9_17

Madhu Bala Myneni¹⁹,
Y. Srividya¹⁹ &
Akhil Dandamudi²⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 507))

1090 Accesses
8 Citations
3 Altmetric

Abstract

Improved imputation has a major role in the research of data pre-process for data analysis. The missing value treatment is implemented with many of the traditional approaches, such as attribute mean/mode, cluster-based mean/mode substitution. In these approaches, the major concentration is missing valued attribute. This paper presents a framework for correlated cluster-based imputation to improve the quality of data for data mining applications. We make use the correlation analysis on data set with respect to missing data attributes. Based on highly correlated attributes, the data set is divided into clusters using suitable clustering techniques and imputes the missing content with respect to cluster mean value. This correlated cluster-based imputation improves the quality of data. The imputed data are analyzed with K-Nearest Neighbor (KNN) and J48 Decision Tree multi-class classifiers. The efficiency of imputation is ascertaining 100 % accuracy with correlated cluster mean imputed data compared with attribute mean imputed data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lakshminarayan K, Harp S. A., and Samad T: Imputation of Missing Data in Industrial Databases: Applied Intelligence, 11: pp. 259–275 (1999).
Google Scholar
Pearson, R. K.: The problem of disguised missing data: SIGKDD Explor. Newsl. Vol.8, no.1, pp. 83–92 (2006).
Google Scholar
Grzymala-Busse J. W. and Hu. M.: A Comparison of Several Approaches to Missing Attribute Values in Data Mining: In RSCTC’2000, pp. 340–347 (2000).
Google Scholar
Hua, M. and Pei, J.: Cleaning disguised missing data: a heuristic approach: In Proceedings of the 13th ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, KDD ‘07. ACM, New York, NY, pp. 950–958, (2007).
Google Scholar
Calders, T., Goethals, B., and Mampaey, M.: Mining itemsets in the presence of missing values: In Proceedings of the 2007 ACM Symposium on Applied Computin (2007).
Google Scholar
Clark and Niblett . T.: The CN2 Induction Algorithm: Machine Learning, 3(4):261–283 (1989).
Google Scholar
Quinlan J. R.: C4.5 Programs for Machine Learning: Morgan Kaufmann, CA (1988).
Google Scholar
Batista G. E. A. P. A. and Monard M. C.: K-Nearest Neighbour as Imputation Method: Experimental Results. Technical report, ICMC-USP (2002).
Google Scholar
Nuryazmin Ahmat Zainuri, et al.: A Comparison of Various Methods for Missing Values in Air Quality Data: Sains Malaysiana 44(3), :449–456; (2015).
Google Scholar
Peng Shangu, Wang Xiwu, Zhong: The study of EM algorithm based on forward sampling: Qigen Electronics, Communications and Control (ICECC), pp. 4597 (2011).
Google Scholar
Wagstaff, K., Cardie, C., Rogers, S. and Schroedl, S.: Constrained k-means clustering with background knowledge: In Proc. of the 18th Intl. Conf. on Machine Learning, pp. 577–584 (2001).
Google Scholar
Madhu Bala Myneni, M. Seetha: Comparative Analysis on Scene Image Classification using Selected Hybrid Features: International Journal of Computer Applications (0975–8887) vol. 63, no.2, pp. 44–47 (2013).
Google Scholar

Download references

Acknowledgments

Authors gratefully acknowledge the computational facility created in the college under DST’s FIST Programme (SR/FST/College-2009/2014(C)) which helped them to carry out the work. The authors are also grateful to the anonymous reviewers for their constructive comments which improved the quality of the paper. Authors thank the management of VBIT for their support and kind encouragement.

Author information

Authors and Affiliations

Institute of Aeronautical Engineering, Hyderabad, India
Madhu Bala Myneni & Y. Srividya
NIIT University, Neemrana, Rajasthan, India
Akhil Dandamudi

Authors

Madhu Bala Myneni
View author publications
You can also search for this author in PubMed Google Scholar
Y. Srividya
View author publications
You can also search for this author in PubMed Google Scholar
Akhil Dandamudi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Madhu Bala Myneni .

Editor information

Editors and Affiliations

ANITS, Prof., Comp. Sci. & Engg. Dept. ANITS, Visakhapatnam, Andhra Pradesh, India
Suresh Chandra Satapathy
JNTUH College of Engg. HYD (Autonomous), Prof. & Head, Comp. Sci. & Engg. Dept. JNTUH College of Engg. HYD (Autonomous), Hyderabad, Telangana, India
V. Kamakshi Prasad
JNTUH College of Engg. HYD (Autonomous), Pro., Dept. Computer Science & Engg. JNTUH College of Engg. HYD (Autonomous), Hyderabad, Telangana, India
B. Padmaja Rani
SCIS, University of Hyderabad , Hyderabad, India
Siba K. Udgata
CMR Technical Campus , Hyderabad, India
K. Srujan Raju

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Myneni, M.B., Srividya, Y., Dandamudi, A. (2017). Correlated Cluster-Based Imputation for Treatment of Missing Values. In: Satapathy, S., Prasad, V., Rani, B., Udgata, S., Raju, K. (eds) Proceedings of the First International Conference on Computational Intelligence and Informatics . Advances in Intelligent Systems and Computing, vol 507. Springer, Singapore. https://doi.org/10.1007/978-981-10-2471-9_17

Download citation

DOI: https://doi.org/10.1007/978-981-10-2471-9_17
Published: 01 December 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2470-2
Online ISBN: 978-981-10-2471-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics