Abstract
The imputation of missing values in datasets always plays an important role in the data preprocessing. In the process of data collection, because of the various reasons, the datasets often contain some missing values, and the excellent missing data imputation algorithms can increase the reliability of the dataset and reduce the impact of missing values on the whole dataset. In this paper, based on the Artificial Neural Network (ANN), we propose a missing data imputation method for the classification-type datasets. For each record which contains missing values, we make a list of the values that can be used to replace the missing data from the complete dataset. Our ANN model uses the complete records as the train dataset, and selects the most appropriate value in the list as the final result based on the label categories of the missing data. In our experiments, we compare our algorithm with the traditional single value imputation method and mean value imputation method with the Pima dataset. The result shows that our proposed algorithm can achieve better classification results when there are more missing values in the dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cheng, Y., Miao, D., Feng, Q.: Positive approximation and converse approximation in interval-valued fuzzy rough sets. Inf. Sci. 181, 2086–2110 (2011)
Meng, Z., Shi, Z.: Extended rough set-based attribute reduction in inconsistent incomplete decision systems. Inf. Sci. 204(20), 44–69 (2012)
Batista, G.E.A.P.A., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell. 17(5–6), 519–533 (2003)
Rahman, G., Islam, Z.: A decision tree-based missing value imputation technique for data pre-processing. In: The Australasian Data Mining Conference, pp. 41–50 (2010)
Silvaramírez, E.L., et al.: Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Netw. Official J. Int. Neural Netw. Soc. 24(1), 121–129 (2011)
Acknowledgement
This work was supported in part by the National Natural Science Foundations of CHINA (Grant No. 61771390, No. 61771392, No. 61501373, and No. 61271279), the National Science and Technology Major Project (Grant No. 2016ZX03001018-004, and No. 2015ZX03002006-004), and the Fundamental Research Funds for the Central Universities (Grant No. 3102017ZY018).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Wang, S., Li, B., Yang, M., Yan, Z. (2019). Missing Data Imputation for Machine Learning. In: Li, B., Yang, M., Yuan, H., Yan, Z. (eds) IoT as a Service. IoTaaS 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 271. Springer, Cham. https://doi.org/10.1007/978-3-030-14657-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-14657-3_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14656-6
Online ISBN: 978-3-030-14657-3
eBook Packages: Computer ScienceComputer Science (R0)